[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: HTTPAPI and I/O utf-8
Hi Henrik and Scott,
Henrik,
The URL you use is different from the one I am using and actually it is
better as there is less to parse in the return value but what was very
handy was the extra parms you used on the get!:
� rc =
http_url_get(url���������������������������� ������������ ���� :tmpFile�������������������������������� ������������������������� ���� :HTTP_TIMEOUT����������������������������� �������������������� ���� :'Mozilla/5.0'����� // important, google doesn't like
unknown
browsers�������������������������������� ���������� ���� :date��������������������������������� ���������������������������� � );
� I have added these 3 parm and every langpair works: de|en or en|de or
en|ja.
The returned HTML is bigger and in proper utf-8.
The code from Scott was already complete, that's Google which needed
more info on the GET.
Thanks a lot to both of you!
and have a nice day
Pascal
2012/1/10 Henrik Rützou <[1]hr@xxxxxxxxxxxx>
� Hi Scott & Pascal,
� I have another "dirty" way (from EBCDIC only, at the moment) ....
� text = %trim('��¦�����'); ���������������//
text to be
� translated� � text = escapeURI(encodeUTF8(text)); ��������������� �������� � ���� � dsp = text; ��������������������������� ������������ � ������������ � Result (in plain EBCDIC):
� DSPLY �%C3%86%C3%98%C3%85%C3%A6%C3%B8%C3%A5
� Now I can use GET:
� // Setup HTTPAPI Google Translate Url ����������� � HTTP_SetFileCCSID(1208); ������������������� ����� � url = '[1][2]http://translate.google.com/translate_a/t'
� ��+ '?client=t' ����������������������� ���������� � ��+ '&text=' + text ��������������������� �������� � ��+ '&sl=' + srcLng ��������������������� �������� � ��+ '&tl=' + transLng ��������������������� ������ � �; ������������������������������� ������������ � �� � date = timestamp() - %years(12); ���������������� � ��������������������������������� �������������� � � � // Send HTTPAPI Request ��������������������� ���� � rc = http_url_get(url ��������������������� ������ � ��:tmpFile ������������������������� ������������� � ��:HTTP_TIMEOUT ����������������������� ���������� � ��:'Mozilla/5.0' ���// important, google dosn't like unknown
� browsers ���������������������������� � ��:date ��������������������������� �������������� � ); ������� � And get the result (google returns JSON in UTF8):
� [[["��¦�����","��¦�����","",""]],,"en",,[["��¦�����",
[5],1,0,1000,0,
� 1,0]],[["��¦�����",5,[["��¦�����",1000,1,0]],[[0,6]],"��
¦�����"]],,,
� [["is"]],2] ��������������������������� ������������ � ����������������������������� � Convert it to XML
� <array depth="1"> ������� � �<array depth="2"> ������ � ��<array depth="3"> ����� � ���<value>��¦�����</value>� � ���<value>��¦�����</value>� � ��</array> �������������� � �</array> ��������������� � �<value>en</value> ������ � �<array depth="2"> ������ � ��<array depth="3"> �����....
� and read the first value element that contains the result of the
� translation.
� The problem here is of course the result file may have
unsupported
� UTF8� � characters, that I then has to either accept as blanks or� replace the
� UTF8� � unit with an valid EBCDIC char or replace the UTF8 unit�with�a
HTML
� equivalent� � such as &#nnn; using CCSID 819 as a itermediate�format.
� What I do is change the file to CCSID 819 using CHGATR. By doing
so I
� I read it as SBCS and know that any character > x'7F' starts a
UTF8
� Unit
� Examples:� � // Replacing Microsoft Smart Quotes with EBCDIC equivalents ��� ���� � ���������� � myField = replaceUTF8unit(myField:'E28098':'''':819); � � myField = replaceUTF8unit(myField:'E28099':'''':819); � � myField = replaceUTF8unit(myField:'E2809C':'"':819); �� � myField = replaceUTF8unit(myField:'E2809D':'"':819); �� � myField = replaceUTF8unit(myField:'E28093':'-':819); �� � myField = replaceUTF8unit(myField:'E28094':'--':819); � � myField = replaceUTF8unit(myField:'E280A6':'...':819);� � ��������������������������������� �������������� � ������� � // Replacing EUR Sign with Text ����������������� ������ � myField = replaceUTF8unit(myField:'E282AC':'EUR':819);� � ��������������������������������� �������������� � ������� � // Converting unsupported UTF-8 Characters to blanks �� � myField = convertUTF8unit(myField:' ':819); ����������� � ��������������������������������� �������������� � ������� � // Decoding UTF-8 Characters read as SBCS ASCII to EBCDIC ��� ���� � ����������������� � myField = decodeUTF8(myField:819); � � or
� ��������������������������������� �������������� � ������������ � // Converting unsupported UTF-8 Characters to HTML/Unicode
encoding� � myField = convertUTF8unit(myField:'*Html':819); ��������� ����� � ���������������� � �������������������������������� ����
� On Mon, Jan 9, 2012 at 11:32 PM, Scott Klement
<[2][3]sk@xxxxxxxxxxxxxxxx>
� wrote:
� � Hi Pascal,
� � The way you're doing it won't work. �You're coding character
string
� � in (containing the URL) EBCDIC, but then trying to concatenate
� � Unicode data in the middle of that string. �HTTPAPI isn't
smart
� � enough to know that part of it is Unicode and part of it is
EBCDIC,
� � so the result will be a mistranslated string.
� � Furthermore, any "special" characters in your data (i.e.
characters
� � that aren't allowed in a URL, or have a special meaning in a URL)
� � aren't escaped, and that'll cause additional problems.
� � You're going to have to use the "Web Form" (URL Encoder) routines
to
� � encode your data properly. �If you use those routines, they
should
� � successfully translate the data to UTF-8 and encode it properly
for
� � a URL.
� � Assuming your input data is EBCDIC (the job's CCSID) -- I've
� � attached an example of this, named 'pascal.txt'
� � So that's fine if your input is EBCDIC -- but if you're planning
to
� � have data in all of English, Japanese and German, it seems very
� � unlikely that EBCDIC input is a good idea... you want Unicode for
� � input, not just output!
� � The only problem with the "url encoder" (or "WEBFORM") routines
is
� � the name of the variable ('text' or 'langpair' in your example)
is
� � received as an alphanumeric parameter. �This works great when
the
� � input is in other CCSIDs besides Unicode -- but for Unicode, it
� � makes more sense to use data type=C (RPG's support for UCS-2).
�
� � This isn't an issue for the value of the variable, since that can
be
� � passed by pointer -- but for the variable name, it's ugly.
� � You can work around it by doing a DS overlay to get the same
input
� � bytes into an alpha variable -- but, it's just a little ugly.
� � Anyway, I've attached an example of this called 'pascal2.txt'
� � This code worked for me using the latest "beta" copy of httpapi,
� � which is found here:
� � [3][4]http://www.scottklement.com/httpapi/beta
� On 1/7/2012 8:38 AM, Pascal Polverini wrote:
� � ���Hi Scott,
� � ���I am trying to send a GET request to a google-translate
HTML
� � page.
� � ���The HTML-page response is in utf-8 but I would also need
to
� � send utf-8
� � ���data to cover any language-pair.
� � ���I tried different things, for Latin character it works
but I
� � cannot get
� � ���back or send Japanese for instance.
� � ���I am not sure if this is because I use a GET. I
understand that
� � with
� � ���POST you can set the CCSID but I am not sure of what to
do for
� � GET.
� � ���Thanks for any tips and in any case thank you for these
� � remarkable
� � ���APIs.
� � ���Pascal
� � --------------------------------------------------------------------
� � ---
� � This is the FTPAPI mailing list. �To unsubscribe, please go
to:
� � [4][5]http://www.scottklement.com/mailman/listinfo/ftpapi
� � --------------------------------------------------------------------
� � ---
� --
� Regards,
� Henrik R��tzou
� � � [5][6]http://powerEXT.com
� � � [plogofull200.png]
References
� 1. [7]http://translate.google.com/translate_a/t
� 2. mailto:[8]sk@xxxxxxxxxxxxxxxx
� 3. [9]http://www.scottklement.com/httpapi/beta
� 4. [10]http://www.scottklement.com/mailman/listinfo/ftpapi
� 5. [11]http://powerext.com/
--------------------------------------------------------------------
---
This is the FTPAPI mailing list. �To unsubscribe, please go to:
[12]http://www.scottklement.com/mailman/listinfo/ftpapi
--------------------------------------------------------------------
---
References
1. mailto:hr@xxxxxxxxxxxx
2. http://translate.google.com/translate_a/t
3. mailto:sk@xxxxxxxxxxxxxxxx
4. http://www.scottklement.com/httpapi/beta
5. http://www.scottklement.com/mailman/listinfo/ftpapi
6. http://powerEXT.com/
7. http://translate.google.com/translate_a/t
8. mailto:sk@xxxxxxxxxxxxxxxx
9. http://www.scottklement.com/httpapi/beta
10. http://www.scottklement.com/mailman/listinfo/ftpapi
11. http://powerext.com/
12. http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------