[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode problem



Hi John,

> Regarding whether or not it's UTF-8, the only reason I thought it was that
> is because the person that generated the file told me it was that.

In order to proceed at all with helping you, I need the following 
question answered.  It's important:

Is the file you're parsing, byte-for-byte, the same as the one you 
posted here?


> I have no other proof that it's UTF-8. I also know that when I used
> CCSID 1208, the file parsed correctly.

If you view the file in hex, I think you'll see that it's not UTF-8. 
The easiest example is the DSPF green-screen tool.  Type:

    DSPF 'freight_201003101542591443916.xml'

Then hit F10=Display Hex.

You should notice that it starts with 'FF FE', which is the byte-order 
mark for UTF-16 LE.   This is followed by '3C 00 3F 00 78 00'.  You'll 
notice that every alternating byte is set to 00.  That's because each 
character in the XML is represented by *two* bytes.  So '3C 00' 
represents the < character.  '3F 00' represents the ? character.

The fact that the zero is in the 2nd byte (as opposed to 00 3F) tells 
you that it's little-endian.

The fact that it's two bytes (or 16 bits) per character tells you that 
this is a 16-bit encoding.  (Not an 8-bit encoding!)

This is either UCS-2LE or UTF-16LE.  There's no chance that this is UTF-8.

>
> Do you know what the CCSID is for UTF-16 (LE).  If so, I can try it.
>

The CCSID for UTF-16 (aka UTF-16BE) is 1200.  There is no CCSID for 
UTF-16LE that works on IBM i that I'm aware of.  But there is a 
different API that can translate it to UTF-8 or UTF-16.  I posted this 
in another message last night, please read that.

I don't understand what's going on here.  To proceed, I really need your 
help on two of the things I asked you previously:

1) Is the file the same as it was when you posted it, or has it changed 
to UTF-8?

2) Did you try HTTP_XML_CALC as the CCSID?  This lets Expat handle the 
translation instead of OS/400 -- and Expat does natively support 
UTF-16LE.   What happens when you use that?

-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------