[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTTPAPI and I/O utf-8



Hi Pascal,

The way you're doing it won't work. You're coding character string in (containing the URL) EBCDIC, but then trying to concatenate Unicode data in the middle of that string. HTTPAPI isn't smart enough to know that part of it is Unicode and part of it is EBCDIC, so the result will be a mistranslated string.

Furthermore, any "special" characters in your data (i.e. characters that aren't allowed in a URL, or have a special meaning in a URL) aren't escaped, and that'll cause additional problems.

You're going to have to use the "Web Form" (URL Encoder) routines to encode your data properly. If you use those routines, they should successfully translate the data to UTF-8 and encode it properly for a URL.

Assuming your input data is EBCDIC (the job's CCSID) -- I've attached an example of this, named 'pascal.txt'

So that's fine if your input is EBCDIC -- but if you're planning to have data in all of English, Japanese and German, it seems very unlikely that EBCDIC input is a good idea... you want Unicode for input, not just output!

The only problem with the "url encoder" (or "WEBFORM") routines is the name of the variable ('text' or 'langpair' in your example) is received as an alphanumeric parameter. This works great when the input is in other CCSIDs besides Unicode -- but for Unicode, it makes more sense to use data type=C (RPG's support for UCS-2). This isn't an issue for the value of the variable, since that can be passed by pointer -- but for the variable name, it's ugly.

You can work around it by doing a DS overlay to get the same input bytes into an alpha variable -- but, it's just a little ugly.

Anyway, I've attached an example of this called 'pascal2.txt'

This code worked for me using the latest "beta" copy of httpapi, which is found here:
http://www.scottklement.com/httpapi/beta


On 1/7/2012 8:38 AM, Pascal Polverini wrote:
    Hi Scott,
    I am trying to send a GET request to a google-translate HTML page.
    The HTML-page response is in utf-8 but I would also need to send utf-8
    data to cover any language-pair.
    I tried different things, for Latin character it works but I cannot get
    back or send Japanese for instance.
    I am not sure if this is because I use a GET. I understand that with
    POST you can set the CCSID but I am not sure of what to do for GET.
    Thanks for any tips and in any case thank you for these remarkable
    APIs.
    Pascal
     H DFTACTGRP(*NO) ACTGRP(*NEW) BNDDIR('HTTPAPI')

      /define WEBFORMS
      /copy qrpglesrc,httpapi_h
      /copy qrpglesrc,ifsio_h

     D toStmf          S            256

      /free
           toStmf = '/tmp/'
                  + 'googleTranslate.html';

           if translate( 'löschen'
                       : 'de|en'
                       : toStmf ) <> 1;
               http_crash();
           endif;

           *inlr = *on;
      /end-free

      *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      * translate(): Call the Google Translate API...
      *
      *    myText = (input) text to translate (in UCS-2 encoding)
      *    myPair = (input) languages to translate from/to
      *    myStmf = (input) path name of stream file object to
      *                      write results to.
      *
      * returns the response code from HTTP_url_get().
      *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     P translate       B
     D                 PI            10i 0
     D   peText                    5000a   varying const
     D   pePair                      10a   varying const
     D   peStmf                    5000a   varying const
     D                                     options(*trim)

     D rc              s             10I 0
     D url             s          32767a   varying
     D form            s                   like(WEBFORM)
     D myText          s                   like(peText)
     D myPair          s                   like(pePair)

      /free
        myText = peText;
        myPair = pePair;

        //
        //  Our input is in the job's CCSID (0=jobs ccsid), and
        //  the output to the HTTP server should be in UTF-8
        //  (CCSID 1208) so override HTTPAPI's defaults:
        //
        HTTP_setCCSIDs( 1208 : 0 );

        //
        //  Since the response file is also expected to be
        //  UTF-8, tell HTTPAPI to create that file accordingly:
        //

        unlink(peStmf);
        HTTP_setFileCCSID( 1208 );

        //
        //  Google expects the input data to be part of a URL,
        //  so it's necessary to encode it with a URL encoder,
        //  otherwise certain characters will be misinterpreted
        //
        //  the WEBFORM_setPtr() routines will also take care of
        //  translating our Unicode fields to UTF-8 for us.
        //

        form = WEBFORM_open();

        WEBFORM_setPtr( form
                      : 'text'
                      : %addr(myText: *data)
                      : %len(myText) );

        WEBFORM_setPtr( form
                      : 'langpair'
                      : %addr(myPair : *data)
                      : %len(myPair)   );

        url = 'http://translate.google.com/translate_t?'
            + WEBFORM_getData( form );
        WEBFORM_close( form );


        // Finally, we can submit the HTTP request!

       rc = http_url_get( url : peStmf );
       return rc;

      /end-free
     P                 E

     H DFTACTGRP(*NO) ACTGRP(*NEW) BNDDIR('HTTPAPI')

      /define WEBFORMS
      /copy qrpglesrc,httpapi_h
      /copy qrpglesrc,ifsio_h

     D toStmf          S            256

      /free
           toStmf = '/tmp/'
                  + 'googleTranslate.html';

           if translate( u'524A9664'
                       : %ucs2('ja|en')
                       : toStmf ) <> 1;
               http_crash();
           endif;

           *inlr = *on;
      /end-free

      *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      * translate(): Call the Google Translate API...
      *
      *    myText = (input) text to translate (in UCS-2 encoding)
      *    myPair = (input) languages to translate from/to
      *    myStmf = (input) path name of stream file object to
      *                      write results to.
      *
      * returns the response code from HTTP_url_get().
      *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     P translate       B
     D                 PI            10i 0
     D   peText                    5000c   varying const
     D   pePair                      10c   varying const
     D   peStmf                    5000a   varying const
     D                                     options(*trim)

     D rc              s             10I 0
     D url             s          32767a   varying
     D form            s                   like(WEBFORM)
     D myText          s                   like(peText)
     D myPair          s                   like(pePair)

     D                 ds
     D   unicode                     25c   varying
     D   alpha                       50a   overlay(unicode:3)

     D myAlphaVar      s             50a   varying

      /free
        myText = peText;
        myPair = pePair;

        //
        //  Our input is in the BMP of UTF-16 Unicode (CCSID 1200)
        //  the output to the HTTP server should be in UTF-8
        //  (CCSID 1208) so override HTTPAPI's defaults:
        //
        HTTP_setCCSIDs( 1208 : 1200 );

        //
        //  Since the response file is also expected to be
        //  UTF-8, tell HTTPAPI to create that file accordingly:
        //

        unlink(peStmf);
        HTTP_setFileCCSID( 1208 );

        //
        //  Google expects the input data to be part of a URL,
        //  so it's necessary to encode it with a URL encoder,
        //  otherwise certain characters will be misinterpreted
        //
        //  the WEBFORM_setPtr() routines will also take care of
        //  translating our Unicode fields to UTF-8 for us.
        //

        form = WEBFORM_open();

        myAlphaVar = 'text';
        WEBFORM_setPtr( form
                      : fixVarName('text')
                      : %addr(myText: *data)
                      : %len(myText)*2 );

        WEBFORM_setPtr( form
                      : fixVarName('langpair')
                      : %addr(myPair : *data)
                      : %len(myPair)*2 );

        url = 'http://translate.google.com/translate_t?'
            + WEBFORM_getData( form );
        WEBFORM_close( form );


        // Finally, we can submit the HTTP request!

       rc = http_url_get( url : peStmf );
       return rc;

      /end-free
     P                 E

     P fixVarName      B
     D                 PI            50a   varying
     D   ucsVarName                  25c   varying const

     D retval          s             50a   varying inz('')

     D                 ds
     D   unicode                     25c
     D   alpha                       50a   overlay(unicode)
      /free
          if %len(ucsVarName) > 0;
             unicode = ucsVarName;
             retval = %subst(alpha:1:%len(ucsVarName)*2);
          endif;
          return retval;
      /end-free
     P                 E
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------