[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: HTTPAPI and I/O utf-8
Hi Scott & Pascal,
I have another "dirty" way (from EBCDIC only, at the moment) ....
text = %trim('���øå'); � � � � � � � �// text to be
translated� text = escapeURI(encodeUTF8(text)); � � � � � � � � � � � � � �� dsp = text; � � � � � � � � � � � � � � � � � � � � � � � � � ��
Result (in plain EBCDIC):
DSPLY �%C3%86%C3%98%C3%85%C3%A6%C3%B8%C3%A5
Now I can use GET:
// Setup HTTPAPI Google Translate Url � � � � � � HTTP_SetFileCCSID(1208); � � � � � � � � � � � �� url = '[1]http://translate.google.com/translate_a/t'
� + '?client=t' � � � � � � � � � � � � � � � � � � + '&text=' + text � � � � � � � � � � � � � � � � + '&sl=' + srcLng � � � � � � � � � � � � � � � � + '&tl=' + transLng � � � � � � � � � � � � � � �; � � � � � � � � � � � � � � � � � � � � � � �� date = timestamp() - %years(12); � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � // Send HTTPAPI Request � � � � � � � � � � � � � rc = http_url_get(url � � � � � � � � � � � � � � � :tmpFile � � � � � � � � � � � � � � � � � � �� � :HTTP_TIMEOUT � � � � � � � � � � � � � � � � � � :'Mozilla/5.0' � �// important, google dosn't like unknown
browsers � � � � � � � � � � � � � �� � :date � � � � � � � � � � � � � � � � � � � � � ); � � � � And get the result (google returns JSON in UTF8):
[[["���øå","���øå","",""]],,"en",,[["���øå",[5],1,0,1000,0,
1,0]],[["���øå",5,[["���øå",1000,1,0]],[[0,6]],"���øå"]],,,
[["is"]],2] � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Convert it to XML
<array depth="1"> � � � � �<array depth="2"> � � �� � <array depth="3"> � � � � �<value>���øå</value>� � �<value>���øå</value>� � </array> � � � � � � �� �</array> � � � � � � � � �<value>en</value> � � �� �<array depth="2"> � � �� � <array depth="3"> � � �....
and read the first value element that contains the result of the
translation.
The problem here is of course the result file may have unsupported
UTF8� characters, that I then has to either accept as blanks or�replace the
UTF8� unit with an valid EBCDIC char or replace the UTF8 unit�with�a HTML
equivalent� such as &#nnn; using CCSID 819 as a itermediate�format.
What I do is change the file to CCSID 819 using CHGATR. By doing so I
I read it as SBCS and know that any character > x'7F' starts a UTF8
Unit
Examples:� // Replacing Microsoft Smart Quotes with EBCDIC equivalents � � � � � � � � �� myField = replaceUTF8unit(myField:'E28098':'''':819); � myField = replaceUTF8unit(myField:'E28099':'''':819); � myField = replaceUTF8unit(myField:'E2809C':'"':819); �� myField = replaceUTF8unit(myField:'E2809D':'"':819); �� myField = replaceUTF8unit(myField:'E28093':'-':819); �� myField = replaceUTF8unit(myField:'E28094':'--':819); � myField = replaceUTF8unit(myField:'E280A6':'...':819);� � � � � � � � � � � � � � � � � � � � � � � � � � � � � // Replacing EUR Sign with Text � � � � � � � � � � � � myField = replaceUTF8unit(myField:'E282AC':'EUR':819);� � � � � � � � � � � � � � � � � � � � � � � � � � � � � // Converting unsupported UTF-8 Characters to blanks �� myField = convertUTF8unit(myField:' ':819); � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � // Decoding UTF-8 Characters read as SBCS ASCII to EBCDIC � � � � � � � � � � � � � myField = decodeUTF8(myField:819); � or
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� // Converting unsupported UTF-8 Characters to HTML/Unicode encoding� myField = convertUTF8unit(myField:'*Html':819); � � � � � ���� � � � � � � � �� �� � � � � � � � � � � � � � � � � � On Mon, Jan 9, 2012 at 11:32 PM, Scott Klement <[2]sk@xxxxxxxxxxxxxxxx>
wrote:
Hi Pascal,
The way you're doing it won't work. �You're coding character string
in (containing the URL) EBCDIC, but then trying to concatenate
Unicode data in the middle of that string. �HTTPAPI isn't smart
enough to know that part of it is Unicode and part of it is EBCDIC,
so the result will be a mistranslated string.
Furthermore, any "special" characters in your data (i.e. characters
that aren't allowed in a URL, or have a special meaning in a URL)
aren't escaped, and that'll cause additional problems.
You're going to have to use the "Web Form" (URL Encoder) routines to
encode your data properly. �If you use those routines, they should
successfully translate the data to UTF-8 and encode it properly for
a URL.
Assuming your input data is EBCDIC (the job's CCSID) -- I've
attached an example of this, named 'pascal.txt'
So that's fine if your input is EBCDIC -- but if you're planning to
have data in all of English, Japanese and German, it seems very
unlikely that EBCDIC input is a good idea... you want Unicode for
input, not just output!
The only problem with the "url encoder" (or "WEBFORM") routines is
the name of the variable ('text' or 'langpair' in your example) is
received as an alphanumeric parameter. �This works great when the
input is in other CCSIDs besides Unicode -- but for Unicode, it
makes more sense to use data type=C (RPG's support for UCS-2). � This isn't an issue for the value of the variable, since that can be
passed by pointer -- but for the variable name, it's ugly.
You can work around it by doing a DS overlay to get the same input
bytes into an alpha variable -- but, it's just a little ugly.
Anyway, I've attached an example of this called 'pascal2.txt'
This code worked for me using the latest "beta" copy of httpapi,
which is found here:
[3]http://www.scottklement.com/httpapi/beta
On 1/7/2012 8:38 AM, Pascal Polverini wrote:
� �Hi Scott,
� �I am trying to send a GET request to a google-translate HTML
page.
� �The HTML-page response is in utf-8 but I would also need to
send utf-8
� �data to cover any language-pair.
� �I tried different things, for Latin character it works but I
cannot get
� �back or send Japanese for instance.
� �I am not sure if this is because I use a GET. I understand that
with
� �POST you can set the CCSID but I am not sure of what to do for
GET.
� �Thanks for any tips and in any case thank you for these
remarkable
� �APIs.
� �Pascal
--------------------------------------------------------------------
---
This is the FTPAPI mailing list. �To unsubscribe, please go to:
[4]http://www.scottklement.com/mailman/listinfo/ftpapi
--------------------------------------------------------------------
---
--
Regards,
Henrik Rützou
� [5]http://powerEXT.com
� [plogofull200.png]
References
1. http://translate.google.com/translate_a/t
2. mailto:sk@xxxxxxxxxxxxxxxx
3. http://www.scottklement.com/httpapi/beta
4. http://www.scottklement.com/mailman/listinfo/ftpapi
5. http://powerext.com/
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------