[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTTPAPI - XPAT - encoding "ISO-8859-2"



Hello,

The crux of the problem is that Expat natively understands the following 
encodings: ISO-8859-1, UTF-8, UTF-16 and US-ASCII.  Both Expat and 
HTTPAPI have mechanisms that let you overcome this limitation, but it 
requires you to write extra code.

So I have two possible solutions:

a) The Expat solution.   Instead of letting HTTPAPI do the XML parsing, 
call the Expat routines directly.  Expat supports an 
"UnknownEncodingHandler".  When Expat analyzes the document and 
discovers that the encoding is ISO-8859-2, it'll call your unknown 
encoding handler.  You will write code to translate the encoding to 
unicode for Expat to process.

This requires an in-depth knowledge of Expat, so it's somewhat 
complicated.  However, it has the advantage that Expat still does the 
analysis of the file to determine the encoding, and therefore you don't 
have to write a routine to determine the encoding prior to calling Expat.

b) The HTTPAPI solution.   In this solution, you will use 
http_parse_xml_stmf() to parse the XML document.  However, in the 2nd 
parameter to this API, you'll specify the CCSID of the data (instead of 
using HTTP_XML_CALC).

To do that, you'll have to first download the XML data, then open it up 
and read it to determine if the encoding is iso-8859-2.  If it is, 
you'll tell http_parse_xml_stmf() that the CCSID is 912 (which 
corresponds to iso-8859-2). Otherwise, you can still use HTTP_XML_CALC 
to let Expat figure out the appropriate encoding.

What will actually happen under the covers in this solution:  When 
HTTPAPI reads your IFS file, it'll translate the data in the file from 
CCSID 912 (iso-8859-2) to UTF-8.  It will tell Expat that the data is in 
UTF-8 format (so Expat will ignore the encoding in the file's header). 
Then it will parse it as a normal UTF-8 document.


Both solutions should work -- though I haven't done much testing of 
specifying a CCSID for http_parse_Xml_stmf(), so I suggest that you get 
the latest beta version of httpapi from 
http://www.scottklement.com/httpapi/beta and help me test it.  If it 
works for you, great.  If not, then I'll need your help to get the bugs 
out of it.

Thanks!


RUDAS István wrote:
> 
> Unfortunately some of the incoming Files are encoded with
> "ISO-8859-2" instead of "ISO-8859-1": and the Tool gives a Returncode
> of "-1" with automatic terminating.
> 
> It would be nice to hear any suggestions, [the option to overtype
> this manually in the incoming file is not acceptable cause it should
> run finally unattended, but thank you for thinking about it, the
> universe and all that kind of things].
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------