[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: HTTPAPI - XPAT - encoding "ISO-8859-2"

Dear Scott,

SUMMARY: thank you for this good advice: It works fine!!!

Due to my (customers) small time budget I choosed the option (b), the HTTPAPI-solution, with the ***current*** Version (1.18).
In the calling CLP I give a Parameter to the Program, if the incoming file is encoded ISO-8859-2, then calling the Parser with codepage 912, otherwise leave it up to the API to decide what encoding it could be. 
(The transmitting partner is known, so I can set this parameter by reading a DBfile where some partner-information - like this new one - is stored. Actually there are only a few black sheep doing the ISO-8859-2-encoding.)

 if   gi_encode = '2';                                       
   rc = http_parse_xml_stmf(i_file                             
                            : 912                              
                            : *NULL                            
                            : %paddr('endOfElement')           
                            : %addr(userData));                
   rc = http_parse_xml_stmf(i_file                             
                            : HTTP_XML_CALC                    
                            : *NULL                            
                            : %paddr('endOfElement')           
                            : %addr(userData));                

if the next future has a little time gap I will download the Beta-Version and test this function again.

Thank you, Scott , for giving a clear solution !!!, 


István RUDAS, 
for: CPB-Software, Vienna, Austria.


-----Ursprüngliche Nachricht-----
Von: ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx [mailto:ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx] Im Auftrag von Scott Klement
Gesendet: Donnerstag, 20. September 2007 20:36
An: HTTPAPI and FTPAPI Projects
Betreff: Re: HTTPAPI - XPAT - encoding "ISO-8859-2"


The crux of the problem is that Expat natively understands the following
encodings: ISO-8859-1, UTF-8, UTF-16 and US-ASCII.  Both Expat and HTTPAPI have mechanisms that let you overcome this limitation, but it requires you to write extra code.

So I have two possible solutions:

a) The Expat solution.   Instead of letting HTTPAPI do the XML parsing, 
call the Expat routines directly.  Expat supports an "UnknownEncodingHandler".  When Expat analyzes the document and discovers that the encoding is ISO-8859-2, it'll call your unknown encoding handler.  You will write code to translate the encoding to unicode for Expat to process.

This requires an in-depth knowledge of Expat, so it's somewhat complicated.  However, it has the advantage that Expat still does the analysis of the file to determine the encoding, and therefore you don't have to write a routine to determine the encoding prior to calling Expat.

b) The HTTPAPI solution.   In this solution, you will use 
http_parse_xml_stmf() to parse the XML document.  However, in the 2nd parameter to this API, you'll specify the CCSID of the data (instead of using HTTP_XML_CALC).

To do that, you'll have to first download the XML data, then open it up and read it to determine if the encoding is iso-8859-2.  If it is, you'll tell http_parse_xml_stmf() that the CCSID is 912 (which corresponds to iso-8859-2). Otherwise, you can still use HTTP_XML_CALC to let Expat figure out the appropriate encoding.

What will actually happen under the covers in this solution:  When HTTPAPI reads your IFS file, it'll translate the data in the file from CCSID 912 (iso-8859-2) to UTF-8.  It will tell Expat that the data is in
UTF-8 format (so Expat will ignore the encoding in the file's header). 
Then it will parse it as a normal UTF-8 document.

Both solutions should work -- though I haven't done much testing of specifying a CCSID for http_parse_Xml_stmf(), so I suggest that you get the latest beta version of httpapi from http://www.scottklement.com/httpapi/beta and help me test it.  If it works for you, great.  If not, then I'll need your help to get the bugs out of it.


RUDAS István wrote:
> Unfortunately some of the incoming Files are encoded with "ISO-8859-2" 
> instead of "ISO-8859-1": and the Tool gives a Returncode of "-1" with 
> automatic terminating.
> It would be nice to hear any suggestions, [the option to overtype this 
> manually in the incoming file is not acceptable cause it should run 
> finally unattended, but thank you for thinking about it, the universe 
> and all that kind of things].
This is the FTPAPI mailing list.  To unsubscribe, please go to:

This is the FTPAPI mailing list.  To unsubscribe, please go to: