[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parser Problems
Hi Tom,
> I have tried 37 and 1208 as the third parm to http_parse_xml_string with the
> same results. I haven't used 1252. Ultimately, I would like the parser to
> swallow it but, if it would return it without error, possibly I could %xlate
> or %subst it out??
Okay, sure... you can do that... I'll go into more detail in a moment.
but, first...
> Logon info is: CallerID: XXXXXXXX, username: YYYYYYY,
> password: ZZZZZZZZ.
Ooops! Didn't you say you were going to send this to me privately? It
appears you sent it to the public list...
I've scrubbed the password from the public archives of the list, and
from the quoted text in my reply (above). But there's nothing I can do
about the e-mails that have been sent to the subscribers of this mailing
list. You may want to change the password.
> Scan the document for<Allotment id="3678"> and then
> <ShortDescription>"Children under 16 years stay free when sharing the same
> cabin/site with paying adults</ShortDescription> follows shortly after.
> Everything parses fine up to this point.
Okay. The situation is apparently not as simple as this? I tried
downloading and parsing your XML, but do not get a parser error, and the
result is apparently valid UTF-8.
But, there seem to be some extra steps happening.... so.. I'm trying to
recreate your problem, but bear in mind that I've never seen your code.
I don't actually know what you are doing! I thought this was a simple
situation where I could retrieve an XML document, and it'd have a bad
character. Apparently there's more to it than that.
My first attempt was this:
h dftactgrp(*no) bnddir('HTTPAPI')
/define WEBFORMS
/copy httpapi_h
D url s 2000a varying
D rc s 10i 0
D myForm s like(WEBFORM)
D myPostData s *
D myPostDataLen s 10i 0
/free
url = 'http://api.netroomz.com.au/allotmentserviceextended.asmx+
/GetAllotmentList';
myForm = WEBFORM_open();
WEBFORM_setVar( myForm : 'callerID': 'XXXXXXXXXXX' );
WEBFORM_setVar( myForm : 'username': 'YYY' );
WEBFORM_setVar( myForm : 'password': 'ZZZZZZ' );
WEBFORM_postData( myForm: myPostData: myPostDataLen );
rc = http_post( url
: myPostData
: myPostDataLen
: '/home/klemscot/TomThomsonTest.xml'
: HTTP_TIMEOUT
: *omit
: 'application/x-www-form-urlencoded' );
WEBFORM_close(myForm);
if rc <> 1;
http_crash();
endif;
*inlr = *on;
When I run this RPG program (with the proper callerID, username and
password) I get a valid UTF-8 XML file that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<string xmlns="http://tempuri.org/">
---data---
</string>
It's a perfectly valid XML file.. but it has only one XML tag! the
'string' tag... and the contents of that tag are a valid character
string... and the string contains another XML document, presumably,
it's the one you're having trouble with. But at this point, all of the
XML tags in the string have been escaped (which they'd HAVE to be if you
want to embed them inside another XML... they'd have to be escaped or in
a CDATA construct...)
Nonetheless, I tried scanning for the offending character, and I found this:
<ShortDescription>...Children under 16 years stay free
Where the 3 dots correspond to x'e2809c'. A 3 byte code, which is
indeed the correct UTF-8 code for the slanted quote you mentioned
earlier! But it's not x'93', it's actually the proper code -- so it
shouldn't have any trouble parsing.
So I took it the next step, and ran the file I downloaded through the
http_parse_xml_stmf() -- that way I thought could see the same parsing
error you're getting... (but I didn't)
Here's the code I added to the program:
D result s 65535a varying
.. previous http_post code is here ...
http_parse_xml_stmf( '/home/klemscot/TomThomsonTest.xml'
: HTTP_XML_CALC
: *null
: %paddr(Parse1)
: %addr(Result) );
.. program ends here ..
P Parse1 B
D PI
D UserData 65535a varying
D depth 10I 0 value
D name 1024A varying const
D path 24576A varying const
D value 65535A varying const
/free
if name = 'string';
UserData = value;
endif;
/end-free
P E
The Result variable contains the entire embedded XML document (as
expected) and the relevant piece looks like this:
<ShortDescription>.Children under 16 years stay free
This is, of course, now in EBCDIC because HTTPAPI knows it's returning
the result to an RPG program, and RPG typically wants EBCDIC. The dot
in the above output is x'3F', the EBCDIC code for an unknown character
(because there's no slanted quotes in EBCDIC)
But no parser error! Essentially, it "swallowed" the bad character,
just like you asked for...
But, here's where things finally go awry. I figured you must be parsing
this inner XML document (after all, you say you're calling
http_parse_xml_string.) So I tried that as well.
I added this on to the end of my program:
http_parse_xml_string( %addr(Result:*data)
: %len(Result)
: 0
: *null
: %paddr(Parse2)
: *null );
And a new parsing subprocedure:
P Parse2 B
D PI
D UserData * value
D depth 10I 0 value
D name 1024A varying const
D path 24576A varying const
D value 65535A varying const
/free
if name = 'ShortDescription';
if %scan('Children': value) > 0;
dsply 'found';
endif;
endif;
/end-free
P E
Now, this finally DID cause the XML parser to complain about an error...
apparently the x'3f' caused it to fail. But -- I never saw a x'93', so
I'm probably not doing the same thing that you are.
I could use %xlate() or %scan/%replace (or %scanrpl if you're on a new
enough release) to eliminate the x'3F' before parsing, but... I'm not
sure if I should, because I think your code is substantially different
from mine, since you're getting a x'93' somehow.
I figured it'd be best to confer with you before continuing...
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------