[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode problem



John,

How did you get the XML file on your iSeries? For example when I use "Total 
Commander" to copy a stream file from my PC on our iSeries the stream file 
automatically gets Cssid 1252, which seems to be somewhat like a default Ccsid.

The first 2 bytes of your stream file are clearly a "byte order mark" (BOM) 
and that BOM indicates that the file is encoded in UTF-16 little-endian order:

    http://en.wikipedia.org/wiki/Byte-order_mark
    http://forums.systeminetwork.com/isnetforums/showthread.php?t=42685

Hence 1252 is definitely the wrong Ccsid. Unfortunately I do not know 
whether there is a Ccsid that matches UTF-16 little-endian.

Regards,

Thomas.

John Clark(Hotmail) schrieb:
> I used WRKLNK to view the attributes of the file and it was listed as 
> CCSID-1252.  I plugged this into the 2nd parm of the http_parse_xml_stmf 
> procedure, but I got the same result(rc = -1).  I also tried using 1200, 
> but the results were identical.
> 
> I have attached the XML file.
> 
> 
> 
> ----- Original Message ----- From: "Scott Klement" <sk@xxxxxxxxxxxxxxxx>
> To: "HTTPAPI and FTPAPI Projects" <ftpapi@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: Monday, March 15, 2010 1:12 PM
> Subject: Re: Unicode problem
> 
> 
> Hi John,
> 
> The XML parser assumes that the XML document is encoded as ISO-8859-1
> unless you specify an encoding in the XML processing directive such as
> <?xml encoding="iso-8859-1"?>
> 
> Since your data is clearly not iso-8859-1 (but the parser thinks it is!)
> the data is being misinterpreted.
> 
> You can override the encoding attribute by specifying a CCSID in the 2nd
> parameter to http_parse_xml_stmf().   I would guess that your data is
> UTF-16, which is CCSID 1200.  So try specifying 1200 in the 2nd
> parameter of http_parse_xml_stmf().
> 
> 
> 
> On 3/15/2010 10:58 AM, John Clark(Hotmail) wrote:
>>
>>     I am using "http_parse_xml_stmf" to parse an IFS file that we get 
>> from
>>     an affiliate.  When I look at the XML file using notepad, everything
>>     appears fine, like the following snippet:
>>
>>
>>
>>     <?xml version="1.0" standalone="yes"?>.....
>>
>>
>>
>>     If I view the file using WRKLNK, it looks like this:
>>
>>
>>
>>     ÿþ<  ? x m l   v e r s i o n = " 1 . 0 "   s t a n d a l o n e = " 
>> y e
>>     s " ?>...
>>
>>
>>
>>     When I use http_parse_xml_stmf, I get a return code of -1.  It 
>> doesn't
>>     even call the parsing routine.  I get the following in the debug 
>> file:
>>
>>
>>
>>     HTTPAPI Ver 1.21 released 2007-10-01
>>
>>     New iconv() objects set, PostRem=819. PostLoc=0. ProtRem=819.
>>     ProtLoc=0
>>     New XML iconv() objects set, xml_Remote=1252. xml_Local=1208
>>     SetError() #66: XML parse failed at line 1, col 2: not well-formed
>>     (invalid token)
>>
>>
>>
>>     I think the IFS file might be in unicode.  If so, how do I handle
>>     this?
>>
>>
>>
>>     John
>>
>>
>>
>>
>> -----------------------------------------------------------------------
>> This is the FTPAPI mailing list.  To unsubscribe, please go to:
>> http://www.scottklement.com/mailman/listinfo/ftpapi
>> -----------------------------------------------------------------------
> 
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------
> 
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------