[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTTPAPI and I/O utf-8



   Hi Scott,
   Thanks a lot for the prompt answer.
   I tried both the code you sent me and the second will be the more handy
   (I will use a Unicode DB for input).
   I then tried the translation in the other way round: from English to
   German or to Japanese and the result is not working:
   Here the code:
   -----------------------------------------------------------------------
   --------------
     if translate( u'524A9664'                         // it works
   //            : %ucs2('ja|en')                      // it doesnt work
                 : %ucs2('ja|en':1200)                 // it works
                 : toStmf ) <> 1;
         http_crash();
     endif;
     if translate( u'006C00F60073006300680065006E'     // it works
                 : %ucs2('de|en':1200)                 // it works
                 : toStmf ) <> 1;
         http_crash();
     endif;
     //note: 'löschen' in utf-8  = x'6CC3B6736368656E'
                       in utf-16 = x'006C00F60073006300680065006E'
   -----------------------------------------------------------------------
   --------------
     if translate( %ucs2('delete':1200)                // it doesnt work
                 : %ucs2('en|ja':1200)
                 : toStmf ) <> 1;
         http_crash();
     endif;
   -----------------------------------------------------------------------
   --------------
   PS: Here bellow some changed I have added (I think that what you meant
   in the code you sent me)
   D myAlphaVar      s             50a   varying
   D myUnicodeVar    s             25c   varying
   ...
   myUnicodeVar = 'text';
   WEBFORM_setPtr( form
                 : fixVarName(myUnicodeVar)
                 : %addr(myText: *data)
                 : %len(myText)*2 );

   myUnicodeVar = 'langpair';
   WEBFORM_setPtr( form
                 : fixVarName(myUnicodeVar)
                 : %addr(myPair : *data)
                 : %len(myPair)*2 );
   -----------------------------------------------------------------------
   --------------
   I have noticed also that the content header of the returned HTML are
   not in utf-8 but takes different charset according to the input|output
   langpair:
   en|ja:
   <html><head><meta content="text/html; charset=Shift_JIS"
   http-equiv="content-type">
   ja|en:
   <html><head><meta content="text/html; charset=Shift_JIS"
   http-equiv="content-type">
   en|de:
   <html><head><meta content="text/html; charset=ISO-8859-1"
   http-equiv="content-type">
   de|en:
   <html><head><meta content="text/html; charset=ISO-8859-1"
   http-equiv="content-type">
   With the following FORM, from a browser the response is always in utf-8
   and ideally we would expect the same from the RPG. I don't know if you
   have the same result on your machine, I am using a v6 (and guess you
   use v7).
   -----------------------------------------------------------------------
   --------------
   <html>
   <form method="get" enctype="multipart/form-data"
         action="[1]http://translate.google.com/translate_t";>

     <input type="text"  name="text" value="delete" />
     <input type="text"  name="langpair" value="en|ja" />

     <input type="submit" />

   </form>
   </html>
    ----------------------------------------------------------------------
   ---------------
   The difference that I can see is that my browser is set to Unicode
   utf-8 (CCSID=1208) and from the RPG we set the data sent to CCSID=1200.
   But since the langpair ja|en is working this shouldn't be part of the
   issue. Maybe there are other differences in what is sent through the
   FORM ?
   Pascal

   2012/1/9 Scott Klement <[2]sk@xxxxxxxxxxxxxxxx>

     Hi Pascal,
     The way you're doing it won't work.  You're coding character string
     in (containing the URL) EBCDIC, but then trying to concatenate
     Unicode data in the middle of that string.  HTTPAPI isn't smart
     enough to know that part of it is Unicode and part of it is EBCDIC,
     so the result will be a mistranslated string.
     Furthermore, any "special" characters in your data (i.e. characters
     that aren't allowed in a URL, or have a special meaning in a URL)
     aren't escaped, and that'll cause additional problems.
     You're going to have to use the "Web Form" (URL Encoder) routines to
     encode your data properly.  If you use those routines, they should
     successfully translate the data to UTF-8 and encode it properly for
     a URL.
     Assuming your input data is EBCDIC (the job's CCSID) -- I've
     attached an example of this, named 'pascal.txt'
     So that's fine if your input is EBCDIC -- but if you're planning to
     have data in all of English, Japanese and German, it seems very
     unlikely that EBCDIC input is a good idea... you want Unicode for
     input, not just output!
     The only problem with the "url encoder" (or "WEBFORM") routines is
     the name of the variable ('text' or 'langpair' in your example) is
     received as an alphanumeric parameter.  This works great when the
     input is in other CCSIDs besides Unicode -- but for Unicode, it
     makes more sense to use data type=C (RPG's support for UCS-2).
     This isn't an issue for the value of the variable, since that can be
     passed by pointer -- but for the variable name, it's ugly.
     You can work around it by doing a DS overlay to get the same input
     bytes into an alpha variable -- but, it's just a little ugly.
     Anyway, I've attached an example of this called 'pascal2.txt'
     This code worked for me using the latest "beta" copy of httpapi,
     which is found here:
     [3]http://www.scottklement.com/httpapi/beta [clock12.png]

   On 1/7/2012 8:38 AM, Pascal Polverini wrote:

        Hi Scott,
        I am trying to send a GET request to a google-translate HTML
     page.
        The HTML-page response is in utf-8 but I would also need to send
     utf-8
        data to cover any language-pair.
        I tried different things, for Latin character it works but I
     cannot get
        back or send Japanese for instance.
        I am not sure if this is because I use a GET. I understand that
     with
        POST you can set the CCSID but I am not sure of what to do for
     GET.
        Thanks for any tips and in any case thank you for these
     remarkable
        APIs.
        Pascal

     --------------------------------------------------------------------
     ---
     This is the FTPAPI mailing list.  To unsubscribe, please go to:
     [4]http://www.scottklement.com/mailman/listinfo/ftpapi
     --------------------------------------------------------------------
     ---

References

   1. http://translate.google.com/translate_t
   2. mailto:sk@xxxxxxxxxxxxxxxx
   3. http://www.scottklement.com/httpapi/beta
   4. http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------