[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Google Translate API's



Henrik,

> It is correct that the original RFC documents specified a URL to be
> escaped in US-ASCII, but that was before the world got globalized, by
> the way Firefox 3.0 encodes AJAX call's in UTF-8 as standard.

I don't think you understood what I said.  I'm not talking about the 
original RFCs, I'm talking about the current standard for HTTP, which 
states that the HTTP protocol uses US-ASCII.

Firefox (all versions -- all the way back to the original Netscape 
releases) correctly encodes the data that's part of the HTTP protocol 
(as opposed to the payload) into US-ASCII.  That hasn't changed, and is 
unlikely to ever change.

The variables in your JavaScript code are UTF-8, that's very true. But 
after Firefox encodes them to put them into an URL, they are valid 7-bit 
US-ASCII in compliance with the HTTP protocol.

The difference between Firefox and HTTPAPI is quite simple:  Firefox 
doesn't translate the characters from one encoding to another.  It 
simply views the characters as bytes (i.e. 8-bit chunks of data) and 
anything that's not a valid character in a URL is converted to hex and 
inserted in it's hex representation.

HTTPAPI on the other hand has to *translate* the data, because the 
variables in your RPG program are NOT in UTF-8.  They are in EBCDIC.  So 
HTTPAPI has to *translate* them to another encoding.  It can't start out 
by viewing them as bytes as Firefox does, because the characters are in 
EBCDIC and most likely the program they are communicating with won't 
expect them to be in EBCDIC!

Now put yourself in my shoes.  Had you never had the problem you're 
currently experiencing...  and you KNEW that tools like Firefox 
typically ran through bytes and checked which ones were/weren't valid 
US-ASCII characters, what would you do? You'd probably translate the 
EBCDIC to ASCII, and then anything not valid in a URL, or outside of the 
7-bit range, you'd convert to hex, right? Then, this data has to be 
converted BACK to EBCDIC... it makes sense for the conversion back to 
use the same table, doesn't it?

But your message has showcased that this method won't work.  The EBCDIC 
has to be converted to UTF-8.  Then the UTF-8 has to be viewed as bytes, 
then the bytes have to be made into valid US-ASCII, and then the 
US-ASCII has to be translated back to EBCDIC.

The fact is... there's nothing in any standard that says this.  The 
standard just says to get the hex values of stuff that isn't valid 
US-ASCII... it doesn't say anything about what the original encoding has 
to be before it's encoded.

You have pointed out that Yahoo wants that original encoding to UTF-8... 
great... I've already told you that I plan to support letting you choose 
any CCSID you want to translate to in a future release.
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------