[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Google Translate API's
Henrik,
> It is correct that the original RFC documents specified a URL to be
> escaped in US-ASCII, but that was before the world got globalized, by
> the way Firefox 3.0 encodes AJAX call's in UTF-8 as standard.
I don't think you understood what I said. I'm not talking about the
original RFCs, I'm talking about the current standard for HTTP, which
states that the HTTP protocol uses US-ASCII.
Firefox (all versions -- all the way back to the original Netscape
releases) correctly encodes the data that's part of the HTTP protocol
(as opposed to the payload) into US-ASCII. That hasn't changed, and is
unlikely to ever change.
The variables in your JavaScript code are UTF-8, that's very true. But
after Firefox encodes them to put them into an URL, they are valid 7-bit
US-ASCII in compliance with the HTTP protocol.
The difference between Firefox and HTTPAPI is quite simple: Firefox
doesn't translate the characters from one encoding to another. It
simply views the characters as bytes (i.e. 8-bit chunks of data) and
anything that's not a valid character in a URL is converted to hex and
inserted in it's hex representation.
HTTPAPI on the other hand has to *translate* the data, because the
variables in your RPG program are NOT in UTF-8. They are in EBCDIC. So
HTTPAPI has to *translate* them to another encoding. It can't start out
by viewing them as bytes as Firefox does, because the characters are in
EBCDIC and most likely the program they are communicating with won't
expect them to be in EBCDIC!
Now put yourself in my shoes. Had you never had the problem you're
currently experiencing... and you KNEW that tools like Firefox
typically ran through bytes and checked which ones were/weren't valid
US-ASCII characters, what would you do? You'd probably translate the
EBCDIC to ASCII, and then anything not valid in a URL, or outside of the
7-bit range, you'd convert to hex, right? Then, this data has to be
converted BACK to EBCDIC... it makes sense for the conversion back to
use the same table, doesn't it?
But your message has showcased that this method won't work. The EBCDIC
has to be converted to UTF-8. Then the UTF-8 has to be viewed as bytes,
then the bytes have to be made into valid US-ASCII, and then the
US-ASCII has to be translated back to EBCDIC.
The fact is... there's nothing in any standard that says this. The
standard just says to get the hex values of stuff that isn't valid
US-ASCII... it doesn't say anything about what the original encoding has
to be before it's encoded.
You have pointed out that Yahoo wants that original encoding to UTF-8...
great... I've already told you that I plan to support letting you choose
any CCSID you want to translate to in a future release.
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------