[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: HTTPAPI and I/O utf-8
Hi Scott,
Thanks a lot for the prompt answer.
I tried both the code you sent me and the second will be the more handy
(I will use a Unicode DB for input).
I then tried the translation in the other way round: from English to
German or to Japanese and the result is not working:
Here the code:
-----------------------------------------------------------------------
--------------
if translate( u'524A9664' // it works
// : %ucs2('ja|en') // it doesnt work
: %ucs2('ja|en':1200) // it works
: toStmf ) <> 1;
http_crash();
endif;
if translate( u'006C00F60073006300680065006E' // it works
: %ucs2('de|en':1200) // it works
: toStmf ) <> 1;
http_crash();
endif;
//note: 'löschen' in utf-8 = x'6CC3B6736368656E'
in utf-16 = x'006C00F60073006300680065006E'
-----------------------------------------------------------------------
--------------
if translate( %ucs2('delete':1200) // it doesnt work
: %ucs2('en|ja':1200)
: toStmf ) <> 1;
http_crash();
endif;
-----------------------------------------------------------------------
--------------
PS: Here bellow some changed I have added (I think that what you meant
in the code you sent me)
D myAlphaVar s 50a varying
D myUnicodeVar s 25c varying
...
myUnicodeVar = 'text';
WEBFORM_setPtr( form
: fixVarName(myUnicodeVar)
: %addr(myText: *data)
: %len(myText)*2 );
myUnicodeVar = 'langpair';
WEBFORM_setPtr( form
: fixVarName(myUnicodeVar)
: %addr(myPair : *data)
: %len(myPair)*2 );
-----------------------------------------------------------------------
--------------
I have noticed also that the content header of the returned HTML are
not in utf-8 but takes different charset according to the input|output
langpair:
en|ja:
<html><head><meta content="text/html; charset=Shift_JIS"
http-equiv="content-type">
ja|en:
<html><head><meta content="text/html; charset=Shift_JIS"
http-equiv="content-type">
en|de:
<html><head><meta content="text/html; charset=ISO-8859-1"
http-equiv="content-type">
de|en:
<html><head><meta content="text/html; charset=ISO-8859-1"
http-equiv="content-type">
With the following FORM, from a browser the response is always in utf-8
and ideally we would expect the same from the RPG. I don't know if you
have the same result on your machine, I am using a v6 (and guess you
use v7).
-----------------------------------------------------------------------
--------------
<html>
<form method="get" enctype="multipart/form-data"
action="[1]http://translate.google.com/translate_t">
<input type="text" name="text" value="delete" />
<input type="text" name="langpair" value="en|ja" />
<input type="submit" />
</form>
</html>
----------------------------------------------------------------------
---------------
The difference that I can see is that my browser is set to Unicode
utf-8 (CCSID=1208) and from the RPG we set the data sent to CCSID=1200.
But since the langpair ja|en is working this shouldn't be part of the
issue. Maybe there are other differences in what is sent through the
FORM ?
Pascal
2012/1/9 Scott Klement <[2]sk@xxxxxxxxxxxxxxxx>
Hi Pascal,
The way you're doing it won't work. You're coding character string
in (containing the URL) EBCDIC, but then trying to concatenate
Unicode data in the middle of that string. HTTPAPI isn't smart
enough to know that part of it is Unicode and part of it is EBCDIC,
so the result will be a mistranslated string.
Furthermore, any "special" characters in your data (i.e. characters
that aren't allowed in a URL, or have a special meaning in a URL)
aren't escaped, and that'll cause additional problems.
You're going to have to use the "Web Form" (URL Encoder) routines to
encode your data properly. If you use those routines, they should
successfully translate the data to UTF-8 and encode it properly for
a URL.
Assuming your input data is EBCDIC (the job's CCSID) -- I've
attached an example of this, named 'pascal.txt'
So that's fine if your input is EBCDIC -- but if you're planning to
have data in all of English, Japanese and German, it seems very
unlikely that EBCDIC input is a good idea... you want Unicode for
input, not just output!
The only problem with the "url encoder" (or "WEBFORM") routines is
the name of the variable ('text' or 'langpair' in your example) is
received as an alphanumeric parameter. This works great when the
input is in other CCSIDs besides Unicode -- but for Unicode, it
makes more sense to use data type=C (RPG's support for UCS-2).
This isn't an issue for the value of the variable, since that can be
passed by pointer -- but for the variable name, it's ugly.
You can work around it by doing a DS overlay to get the same input
bytes into an alpha variable -- but, it's just a little ugly.
Anyway, I've attached an example of this called 'pascal2.txt'
This code worked for me using the latest "beta" copy of httpapi,
which is found here:
[3]http://www.scottklement.com/httpapi/beta [clock12.png]
On 1/7/2012 8:38 AM, Pascal Polverini wrote:
Hi Scott,
I am trying to send a GET request to a google-translate HTML
page.
The HTML-page response is in utf-8 but I would also need to send
utf-8
data to cover any language-pair.
I tried different things, for Latin character it works but I
cannot get
back or send Japanese for instance.
I am not sure if this is because I use a GET. I understand that
with
POST you can set the CCSID but I am not sure of what to do for
GET.
Thanks for any tips and in any case thank you for these
remarkable
APIs.
Pascal
--------------------------------------------------------------------
---
This is the FTPAPI mailing list. To unsubscribe, please go to:
[4]http://www.scottklement.com/mailman/listinfo/ftpapi
--------------------------------------------------------------------
---
References
1. http://translate.google.com/translate_t
2. mailto:sk@xxxxxxxxxxxxxxxx
3. http://www.scottklement.com/httpapi/beta
4. http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------