[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTTPAPI and I/O utf-8



   Hi Henrik and Scott,
   Henrik,
   The URL you use is different from the one I am using and actually it is
   better as there is less to parse in the return value but what was very
   handy was the extra parms you used on the get!:
   � rc =
   http_url_get(url����������������������������    ������������   ����   :tmpFile��������������������������������    �������������������������   ����   :HTTP_TIMEOUT�����������������������������   ��������������������   ���� :'Mozilla/5.0'����� // important, google doesn't like
   unknown
   browsers��������������������������������    ����������   ����   :date���������������������������������   ����������������������������   � );
   �   I have added these 3 parm and every langpair works: de|en or en|de or
   en|ja.
   The returned HTML is bigger and in proper utf-8.
   The code from Scott was already complete, that's Google which needed
   more info on the GET.
   Thanks a lot to both of you!
   and have a nice day
   Pascal

   2012/1/10 Henrik Rützou <[1]hr@xxxxxxxxxxxx>

     � Hi Scott & Pascal,
     � I have another "dirty" way (from EBCDIC only, at the moment) ....
     � text = %trim('��¦�����'); ���������������//
     text to be
     � translated�     � text = escapeURI(encodeUTF8(text)); ���������������     ��������     � ����     � dsp = text; ���������������������������     ������������     � ������������     � Result (in plain EBCDIC):
     � DSPLY �%C3%86%C3%98%C3%85%C3%A6%C3%B8%C3%A5
     � Now I can use GET:
     � // Setup HTTPAPI Google Translate Url �����������     � HTTP_SetFileCCSID(1208); �������������������     �����     � url = '[1][2]http://translate.google.com/translate_a/t'
     � ��+ '?client=t' �����������������������     ����������     � ��+ '&text=' + text ���������������������     ��������     � ��+ '&sl=' + srcLng ���������������������     ��������     � ��+ '&tl=' + transLng ���������������������     ������     � �; �������������������������������     ������������     � ��     � date = timestamp() - %years(12); ����������������     � ���������������������������������     ��������������     � �     � // Send HTTPAPI Request ���������������������     ����     � rc = http_url_get(url ���������������������     ������     � ��:tmpFile �������������������������     �������������     � ��:HTTP_TIMEOUT �����������������������     ����������     � ��:'Mozilla/5.0' ���// important, google dosn't like unknown
     � browsers ����������������������������     � ��:date ���������������������������     ��������������     � ); �������     � And get the result (google returns JSON in UTF8):
     �     [[["��¦�����","��¦�����","",""]],,"en",,[["��¦�����",
     [5],1,0,1000,0,
     �     1,0]],[["��¦�����",5,[["��¦�����",1000,1,0]],[[0,6]],"��
     ¦�����"]],,,
     � [["is"]],2] ���������������������������     ������������     � �����������������������������     � Convert it to XML
     � <array depth="1"> �������     � �<array depth="2"> ������     � ��<array depth="3"> �����     � ���<value>��¦�����</value>�     � ���<value>��¦�����</value>�     � ��</array> ��������������     � �</array> ���������������     � �<value>en</value> ������     � �<array depth="2"> ������     � ��<array depth="3"> �����....
     � and read the first value element that contains the result of the
     � translation.
     � The problem here is of course the result file may have
     unsupported
     � UTF8�     � characters, that I then has to either accept as blanks or�     replace the
     � UTF8�     � unit with an valid EBCDIC char or replace the UTF8 unit�with�a
     HTML
     � equivalent�     � such as &#nnn; using CCSID 819 as a itermediate�format.
     � What I do is change the file to CCSID 819 using CHGATR. By doing
     so I
     � I read it as SBCS and know that any character > x'7F' starts a
     UTF8
     � Unit
     � Examples:�     � // Replacing Microsoft Smart Quotes with EBCDIC equivalents ���     ����     � ����������     � myField = replaceUTF8unit(myField:'E28098':'''':819); �     � myField = replaceUTF8unit(myField:'E28099':'''':819); �     � myField = replaceUTF8unit(myField:'E2809C':'"':819); ��     � myField = replaceUTF8unit(myField:'E2809D':'"':819); ��     � myField = replaceUTF8unit(myField:'E28093':'-':819); ��     � myField = replaceUTF8unit(myField:'E28094':'--':819); �     � myField = replaceUTF8unit(myField:'E280A6':'...':819);�     � ���������������������������������     ��������������     � �������     � // Replacing EUR Sign with Text �����������������     ������     � myField = replaceUTF8unit(myField:'E282AC':'EUR':819);�     � ���������������������������������     ��������������     � �������     � // Converting unsupported UTF-8 Characters to blanks ��     � myField = convertUTF8unit(myField:' ':819); �����������     � ���������������������������������     ��������������     � �������     � // Decoding UTF-8 Characters read as SBCS ASCII to EBCDIC ���     ����     � �����������������     � myField = decodeUTF8(myField:819); �     � or
     � ���������������������������������     ��������������     � ������������     � // Converting unsupported UTF-8 Characters to HTML/Unicode
     encoding�     � myField = convertUTF8unit(myField:'*Html':819); ���������     �����     � ����������������     � ��������������������������������     ����
   � On Mon, Jan 9, 2012 at 11:32 PM, Scott Klement
   <[2][3]sk@xxxxxxxxxxxxxxxx>
   � wrote:
   � � Hi Pascal,

     � � The way you're doing it won't work. �You're coding character
     string

   � � in (containing the URL) EBCDIC, but then trying to concatenate

     � � Unicode data in the middle of that string. �HTTPAPI isn't
     smart

   � � enough to know that part of it is Unicode and part of it is
   EBCDIC,
   � � so the result will be a mistranslated string.
   � � Furthermore, any "special" characters in your data (i.e.
   characters
   � � that aren't allowed in a URL, or have a special meaning in a URL)
   � � aren't escaped, and that'll cause additional problems.
   � � You're going to have to use the "Web Form" (URL Encoder) routines
   to

     � � encode your data properly. �If you use those routines, they
     should

   � � successfully translate the data to UTF-8 and encode it properly
   for
   � � a URL.
   � � Assuming your input data is EBCDIC (the job's CCSID) -- I've
   � � attached an example of this, named 'pascal.txt'
   � � So that's fine if your input is EBCDIC -- but if you're planning
   to
   � � have data in all of English, Japanese and German, it seems very
   � � unlikely that EBCDIC input is a good idea... you want Unicode for
   � � input, not just output!
   � � The only problem with the "url encoder" (or "WEBFORM") routines
   is
   � � the name of the variable ('text' or 'langpair' in your example)
   is

     � � received as an alphanumeric parameter. �This works great when
     the

   � � input is in other CCSIDs besides Unicode -- but for Unicode, it

     � � makes more sense to use data type=C (RPG's support for UCS-2).
     �
   � � This isn't an issue for the value of the variable, since that can
   be
   � � passed by pointer -- but for the variable name, it's ugly.
   � � You can work around it by doing a DS overlay to get the same
   input
   � � bytes into an alpha variable -- but, it's just a little ugly.
   � � Anyway, I've attached an example of this called 'pascal2.txt'
   � � This code worked for me using the latest "beta" copy of httpapi,
   � � which is found here:

     � � [3][4]http://www.scottklement.com/httpapi/beta

   � On 1/7/2012 8:38 AM, Pascal Polverini wrote:

     � � ���Hi Scott,
     � � ���I am trying to send a GET request to a google-translate
     HTML
     � � page.
     � � ���The HTML-page response is in utf-8 but I would also need
     to
     � � send utf-8
     � � ���data to cover any language-pair.
     � � ���I tried different things, for Latin character it works
     but I
     � � cannot get
     � � ���back or send Japanese for instance.
     � � ���I am not sure if this is because I use a GET. I
     understand that
     � � with
     � � ���POST you can set the CCSID but I am not sure of what to
     do for
     � � GET.
     � � ���Thanks for any tips and in any case thank you for these
     � � remarkable
     � � ���APIs.
     � � ���Pascal
     � �     --------------------------------------------------------------------
     � � ---
     � � This is the FTPAPI mailing list. �To unsubscribe, please go
     to:
     � � [4][5]http://www.scottklement.com/mailman/listinfo/ftpapi
     � �     --------------------------------------------------------------------
     � � ---
     � --
     � Regards,
     � Henrik R��tzou
     � �     � [5][6]http://powerEXT.com
     � �     � [plogofull200.png]
     References
     � 1. [7]http://translate.google.com/translate_a/t
     � 2. mailto:[8]sk@xxxxxxxxxxxxxxxx
     � 3. [9]http://www.scottklement.com/httpapi/beta
     � 4. [10]http://www.scottklement.com/mailman/listinfo/ftpapi
     � 5. [11]http://powerext.com/
     --------------------------------------------------------------------
     ---
     This is the FTPAPI mailing list. �To unsubscribe, please go to:
     [12]http://www.scottklement.com/mailman/listinfo/ftpapi
     --------------------------------------------------------------------
     ---

References

   1. mailto:hr@xxxxxxxxxxxx
   2. http://translate.google.com/translate_a/t
   3. mailto:sk@xxxxxxxxxxxxxxxx
   4. http://www.scottklement.com/httpapi/beta
   5. http://www.scottklement.com/mailman/listinfo/ftpapi
   6. http://powerEXT.com/
   7. http://translate.google.com/translate_a/t
   8. mailto:sk@xxxxxxxxxxxxxxxx
   9. http://www.scottklement.com/httpapi/beta
  10. http://www.scottklement.com/mailman/listinfo/ftpapi
  11. http://powerext.com/
  12. http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------