[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTTPAPI and I/O utf-8



   Hi Scott & Pascal,

   I have another "dirty" way (from EBCDIC only, at the moment) ....

   text = %trim('���øå'); � � � � � � � �// text to be
   translated�   text = escapeURI(encodeUTF8(text)); � � � � � � � � � � � �   � ��   dsp = text; � � � � � � � � � � � � � � � � � � � �   � � � � � ��
   Result (in plain EBCDIC):

   DSPLY �%C3%86%C3%98%C3%85%C3%A6%C3%B8%C3%A5

   Now I can use GET:

   // Setup HTTPAPI Google Translate Url � � � � � �   HTTP_SetFileCCSID(1208); � � � � � � � � � � � ��   url = '[1]http://translate.google.com/translate_a/t'
   � + '?client=t' � � � � � � � � � � � � � � � � �   � + '&text=' + text � � � � � � � � � � � � � � �   � + '&sl=' + srcLng � � � � � � � � � � � � � � �   � + '&tl=' + transLng � � � � � � � � � � � � � �   �; � � � � � � � � � � � � � � � � � � � � � �   ��   date = timestamp() - %years(12); � � � � � � � ��   � � � � � � � � � � � � � � � � � � � � � � � �   �   // Send HTTPAPI Request � � � � � � � � � � � � �   rc = http_url_get(url � � � � � � � � � � � � � �   � :tmpFile � � � � � � � � � � � � � � � � � � ��   � :HTTP_TIMEOUT � � � � � � � � � � � � � � � � �   � :'Mozilla/5.0' � �// important, google dosn't like unknown
   browsers � � � � � � � � � � � � � ��   � :date � � � � � � � � � � � � � � � � � � � � �   ); � � � �   And get the result (google returns JSON in UTF8):
   [[["���øå","���øå","",""]],,"en",,[["���øå",[5],1,0,1000,0,
   1,0]],[["���øå",5,[["���øå",1000,1,0]],[[0,6]],"���øå"]],,,
   [["is"]],2] � � � � � � � � � � � � � � � � � � � �   � � � � � � � � � � � � � � �   Convert it to XML
   <array depth="1"> � � � �   �<array depth="2"> � � ��   � <array depth="3"> � � �   � �<value>���øå</value>�   � �<value>���øå</value>�   � </array> � � � � � � ��   �</array> � � � � � � � �   �<value>en</value> � � ��   �<array depth="2"> � � ��   � <array depth="3"> � � �....
   and read the first value element that contains the result of the
   translation.
   The problem here is of course the result file may have unsupported
   UTF8�   characters, that I then has to either accept as blanks or�replace the
   UTF8�   unit with an valid EBCDIC char or replace the UTF8 unit�with�a HTML
   equivalent�   such as &#nnn; using CCSID 819 as a itermediate�format.
   What I do is change the file to CCSID 819 using CHGATR. By doing so I
   I read it as SBCS and know that any character > x'7F' starts a UTF8
   Unit
   Examples:�   // Replacing Microsoft Smart Quotes with EBCDIC equivalents � � � �   � � � � ��   myField = replaceUTF8unit(myField:'E28098':'''':819); �   myField = replaceUTF8unit(myField:'E28099':'''':819); �   myField = replaceUTF8unit(myField:'E2809C':'"':819); ��   myField = replaceUTF8unit(myField:'E2809D':'"':819); ��   myField = replaceUTF8unit(myField:'E28093':'-':819); ��   myField = replaceUTF8unit(myField:'E28094':'--':819); �   myField = replaceUTF8unit(myField:'E280A6':'...':819);�   � � � � � � � � � � � � � � � � � � � � � � � �   � � � �   // Replacing EUR Sign with Text � � � � � � � � � � � �   myField = replaceUTF8unit(myField:'E282AC':'EUR':819);�   � � � � � � � � � � � � � � � � � � � � � � � �   � � � �   // Converting unsupported UTF-8 Characters to blanks ��   myField = convertUTF8unit(myField:' ':819); � � � � � �   � � � � � � � � � � � � � � � � � � � � � � � �   � � � �   // Decoding UTF-8 Characters read as SBCS ASCII to EBCDIC � � � �   � � � � � � � � �   myField = decodeUTF8(myField:819); �   or
   � � � � � � � � � � � � � � � � � � � � � � � �   � � � � � ��   // Converting unsupported UTF-8 Characters to HTML/Unicode encoding�   myField = convertUTF8unit(myField:'*Html':819); � � � � � ����   � � � � � � � ��   �� � � � � � � � � � � � � � � � � �   On Mon, Jan 9, 2012 at 11:32 PM, Scott Klement <[2]sk@xxxxxxxxxxxxxxxx>
   wrote:

     Hi Pascal,
     The way you're doing it won't work. �You're coding character string
     in (containing the URL) EBCDIC, but then trying to concatenate
     Unicode data in the middle of that string. �HTTPAPI isn't smart
     enough to know that part of it is Unicode and part of it is EBCDIC,
     so the result will be a mistranslated string.
     Furthermore, any "special" characters in your data (i.e. characters
     that aren't allowed in a URL, or have a special meaning in a URL)
     aren't escaped, and that'll cause additional problems.
     You're going to have to use the "Web Form" (URL Encoder) routines to
     encode your data properly. �If you use those routines, they should
     successfully translate the data to UTF-8 and encode it properly for
     a URL.
     Assuming your input data is EBCDIC (the job's CCSID) -- I've
     attached an example of this, named 'pascal.txt'
     So that's fine if your input is EBCDIC -- but if you're planning to
     have data in all of English, Japanese and German, it seems very
     unlikely that EBCDIC input is a good idea... you want Unicode for
     input, not just output!
     The only problem with the "url encoder" (or "WEBFORM") routines is
     the name of the variable ('text' or 'langpair' in your example) is
     received as an alphanumeric parameter. �This works great when the
     input is in other CCSIDs besides Unicode -- but for Unicode, it
     makes more sense to use data type=C (RPG's support for UCS-2). �     This isn't an issue for the value of the variable, since that can be
     passed by pointer -- but for the variable name, it's ugly.
     You can work around it by doing a DS overlay to get the same input
     bytes into an alpha variable -- but, it's just a little ugly.
     Anyway, I've attached an example of this called 'pascal2.txt'
     This code worked for me using the latest "beta" copy of httpapi,
     which is found here:
     [3]http://www.scottklement.com/httpapi/beta

   On 1/7/2012 8:38 AM, Pascal Polverini wrote:

     � �Hi Scott,
     � �I am trying to send a GET request to a google-translate HTML
     page.
     � �The HTML-page response is in utf-8 but I would also need to
     send utf-8
     � �data to cover any language-pair.
     � �I tried different things, for Latin character it works but I
     cannot get
     � �back or send Japanese for instance.
     � �I am not sure if this is because I use a GET. I understand that
     with
     � �POST you can set the CCSID but I am not sure of what to do for
     GET.
     � �Thanks for any tips and in any case thank you for these
     remarkable
     � �APIs.
     � �Pascal

     --------------------------------------------------------------------
     ---
     This is the FTPAPI mailing list. �To unsubscribe, please go to:
     [4]http://www.scottklement.com/mailman/listinfo/ftpapi
     --------------------------------------------------------------------
     ---

   --
   Regards,
   Henrik Rützou
   �   [5]http://powerEXT.com
   �   [plogofull200.png]

References

   1. http://translate.google.com/translate_a/t
   2. mailto:sk@xxxxxxxxxxxxxxxx
   3. http://www.scottklement.com/httpapi/beta
   4. http://www.scottklement.com/mailman/listinfo/ftpapi
   5. http://powerext.com/
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------