[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parser Problems



Hi Tom,


> I have tried 37 and 1208 as the third parm to http_parse_xml_string with the
> same results.  I haven't used 1252.  Ultimately, I would like the parser to
> swallow it but, if it would return it without error, possibly I could %xlate
> or %subst it out??

Okay, sure... you can do that...  I'll go into more detail in a moment. 
  but, first...


> Logon info is: CallerID: XXXXXXXX, username: YYYYYYY,
> password: ZZZZZZZZ.

Ooops!  Didn't you say you were going to send this to me privately?  It 
appears you sent it to the public list...

I've scrubbed the password from the public archives of the list, and 
from the quoted text in my reply (above).  But there's nothing I can do 
about the e-mails that have been sent to the subscribers of this mailing 
list.  You may want to change the password.


> Scan the document for<Allotment id="3678">  and then
> <ShortDescription>"Children under 16 years stay free when sharing the same
> cabin/site with paying adults</ShortDescription>  follows shortly after.
> Everything parses fine up to this point.

Okay. The situation is apparently not as simple as this? I tried 
downloading and parsing your XML, but do not get a parser error, and the 
result is apparently valid UTF-8.

But, there seem to be some extra steps happening.... so..  I'm trying to 
recreate your problem, but bear in mind that I've never seen your code. 
  I don't actually know what you are doing!  I thought this was a simple 
situation where I could retrieve an XML document, and it'd have a bad 
character.  Apparently there's more to it than that.

My first attempt was this:

      h dftactgrp(*no) bnddir('HTTPAPI')

       /define WEBFORMS
       /copy httpapi_h

      D url             s           2000a   varying
      D rc              s             10i 0
      D myForm          s                   like(WEBFORM)
      D myPostData      s               *
      D myPostDataLen   s             10i 0

       /free
         url = 'http://api.netroomz.com.au/allotmentserviceextended.asmx+
                /GetAllotmentList';

         myForm = WEBFORM_open();
         WEBFORM_setVar( myForm : 'callerID': 'XXXXXXXXXXX' );
         WEBFORM_setVar( myForm : 'username': 'YYY'         );
         WEBFORM_setVar( myForm : 'password': 'ZZZZZZ'      );
         WEBFORM_postData( myForm: myPostData: myPostDataLen );

         rc = http_post( url
                       : myPostData
                       : myPostDataLen
                       : '/home/klemscot/TomThomsonTest.xml'
                       : HTTP_TIMEOUT
                       : *omit
                       : 'application/x-www-form-urlencoded' );

         WEBFORM_close(myForm);

         if rc <> 1;
            http_crash();
         endif;

         *inlr = *on;

When I run this RPG program (with the proper callerID, username and 
password) I get a valid UTF-8 XML file that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<string xmlns="http://tempuri.org/";>
---data---
</string>

It's a perfectly valid XML file.. but it has only one XML tag! the 
'string' tag... and the contents of that tag are a valid character 
string...  and the string contains another XML document, presumably, 
it's the one you're having trouble with.  But at this point, all of the 
XML tags in the string have been escaped (which they'd HAVE to be if you 
want to embed them inside another XML... they'd have to be escaped or in 
a CDATA construct...)

Nonetheless, I tried scanning for the offending character, and I found this:

  &lt;ShortDescription&gt;...Children under 16 years stay free

Where the 3 dots correspond to x'e2809c'.  A 3 byte code, which is 
indeed the correct UTF-8 code for the slanted quote you mentioned 
earlier!  But it's not x'93', it's actually the proper code -- so it 
shouldn't have any trouble parsing.

So I took it the next step, and ran the file I downloaded through the 
http_parse_xml_stmf() -- that way I thought could see the same parsing 
error you're getting...  (but I didn't)

Here's the code I added to the program:

      D result          s          65535a   varying

         .. previous http_post code is here ...

         http_parse_xml_stmf( '/home/klemscot/TomThomsonTest.xml'
                            : HTTP_XML_CALC
                            : *null
                            : %paddr(Parse1)
                            : %addr(Result) );
         ..  program ends here ..

      P Parse1          B
      D                 PI
      D   UserData                 65535a   varying
      D   depth                       10I 0 value
      D   name                      1024A   varying const
      D   path                     24576A   varying const
      D   value                    65535A   varying const
       /free
           if name = 'string';
              UserData = value;
           endif;
       /end-free
      P                 E


The Result variable contains the entire embedded XML document (as 
expected) and the relevant piece looks like this:

  <ShortDescription>.Children under 16 years stay free

This is, of course, now in EBCDIC because HTTPAPI knows it's returning 
the result to an RPG program, and RPG typically wants EBCDIC.  The dot 
in the above output is x'3F', the EBCDIC code for an unknown character 
(because there's no slanted quotes in EBCDIC)

But no parser error!  Essentially, it "swallowed" the bad character, 
just like you asked for...

But, here's where things finally go awry.  I figured you must be parsing 
this inner XML document (after all, you say you're calling 
http_parse_xml_string.)  So I tried that as well.

I added this on to the end of my program:

     http_parse_xml_string( %addr(Result:*data)
                          : %len(Result)
                          : 0
                          : *null
                          : %paddr(Parse2)
                          : *null );

And a new parsing subprocedure:

      P Parse2          B
      D                 PI
      D   UserData                      *   value
      D   depth                       10I 0 value
      D   name                      1024A   varying const
      D   path                     24576A   varying const
      D   value                    65535A   varying const
       /free
          if name = 'ShortDescription';
            if %scan('Children': value) > 0;
               dsply 'found';
            endif;
          endif;
       /end-free
      P                 E

Now, this finally DID cause the XML parser to complain about an error... 
apparently the x'3f' caused it to fail.  But -- I never saw a x'93', so 
I'm probably not doing the same thing that you are.

I could use %xlate() or %scan/%replace (or %scanrpl if you're on a new 
enough release) to eliminate the x'3F' before parsing, but...  I'm not 
sure if I should, because I think your code is substantially different 
from mine, since you're getting a x'93' somehow.

I figured it'd be best to confer with you before continuing...
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------