[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex XML Value



Hi Michael,

The purpose of CHARDATA1 was to illustrate how Expat works.  It's not 
intended to be a utility that you can use to parse XML.  It's intended to 
demonstrate the flow of events in Expat by showing how Expat first calls 
the start handler, then the character data handler, then the end handler, 
repeatedly for the whole XML document.

For example, here's a trivial XML document:

     <Blah>
        <ReallyBlah>Dum De Dum</ReallyBlah>
     </Blah>

Expat will parse the XML, then call your RPG subprocedures in the 
following manner:

Start handler w/following parms:
    elemName = Blah
    attr(1) = *NULL

Chardata handler w/following parms:
    String = CRLF + "  "  (the CRLF after the <Blah> element, and the
                           two spaces that start the subsequent line)
    Len = 4.

Start handler:
    elemName = "ReallyBlah"
    attr(1) = *NULL

Chardata handler:
     String = "Dum De Dum"
     Len = 10

End handler:
     elemName = "ReallyBlah"

Chardata handler
     String = CRLF
     Len = 2

End handler
     elemName = "Blah"

So all Expat does is call YOUR SUBPROCEDURES (I want to emphasize this, 
because it's my RPG subprocedure, not Expat, that formats and prints the 
results).  The parameters that it passes are in UTF-8 unicode.

Let's look at your sample XML data:

   <partyIdentifier partyIdentifierCode="account"
       partyIdentifierQualifierCode="receiverAssigned"/>

This is actually a simpler example than my "Blah" one above!  There's only 
one XML elemnt, whereas my sample had two.  There's no character data at 
all.

Expat will call your subproceures in this sequence

Start handler
    elemName = partyIdentifier
    attr(1) = partyIdentifierCode
    attr(2) = account
    attr(3) = partyIdentifierQualifierCode
    attr(4) = receiverAssigned
    attr(5) = *NULL

End hanlder:
    elemName = partyIdentifier

So your subprocedure gets called with the above parameters. Because 
they're C-style strings that are in UTF-8 unicode, you have to do some 
work to get them into RPG style strings that are in EBCDIC -- but you sure 
don't have to parse anything :)

The attr parameter is an array of pointers.  It's variable-length -- you 
detect the end of the array by looking for a pointer that's set to *NULL. 
In my "Blah" example, there were no attributes, so the first element was a 
*NULL.  In your example, everything is done with attributes...

In the code for the "start" subprocedure in the CHARDATA1 program, it does 
this to convert the element name passed by Expat into an RPG-style EBCDIC 
variable:

    elemName = %str(elem);
    QDCXLATE( %len(%trimr(elemName))
            : elemName
            : 'QTCPEBC' );

The first line uses the %str() BIF to extract the C-style string into an 
RPG style string.  The second line uses the QDCXLATE API to convert from 
ASCII to EBCDIC.  Technically that's wrong, since Expat passes the data in 
UTF-8 unicode, not ASCII, but if there are no special/cultural characters, 
they have the same hex values so it works.   I didn't want to start out my 
article having to explain the complexity of the iconv() API for fear that 
the reader would get up and run away screaming... :)

I've since learned that it's possible to output UTF-16 from Expat instead 
of UTF-8.  One of these days I'll experiment with that, as it would be 
much easier to deal with in an RPG program -- but I'm going off on a 
tangent now...

After the element name has been extracted to an RPG-style string and 
converted to EBCDIC, it's added to a variable called "PrintMe" so it can 
be printed out.  Nothing exciting, just an EVAL statement:

      printme = %subst(blanks: 1: depth)
              + %trimr(elemName);

The attribute names are then extracted from the array of pointers.  Here's 
the code (from the same subprocedure) that does that:

     x = 1;
     dow attr(x) <> *NULL;
        AttrName = %str(attr(x));
        QDCXLATE( %len(%trimr(AttrName))
                : AttrName
                : 'QTCPEBC' );

        AttrVal  = %str(attr(x+1));
        QDCXLATE( %len(%trimr(AttrVal))
                : AttrVal
                : 'QTCPEBC' );

        printme = printme + ' ' + %trimr(Attrname)
                          + '="' + %trimr(AttrVal) + '"';
        x = x + 2;
     enddo;

Remember, in your example, you'd have this:

    attr(1) = partyIdentifierCode
    attr(2) = account
    attr(3) = partyIdentifierQualifierCode
    attr(4) = receiverAssigned
    attr(5) = *NULL

First time through the loop. X=1.  So when the %str() BIF is called to get 
attr(x), it'll return partyIdentifierCode (in UTF-8) to the RPG variable 
named AttrName.  It then converts it to EBCDIC.

Then we do the same thing with X+1, and the value of "account" is 
extracted to the variable named AttrVal.  By the time you reach the 
"PrintME" line, this is what you have:

     AttrName = partyIdentifierCode
     AttrVal  = account

You can do whatever you like with those values. As you can see, you don't 
have to parse them out, you already have them in RPG variables.  Use them 
as you see fit.

For my example (and I think this is what's confusing you) I do this:

        printme = printme + ' ' + %trimr(Attrname)
                          + '="' + %trimr(AttrVal) + '"';

Remember, "printme" already has the element name (partyIdentifier) in it. 
so now I'm adding "partyIdentifierCode", then "=", then quote, then 
"account", then another quote.  I'm manually adding them together and 
undoing what Expat parsed!  Why did I do that?  Because I thought it'd 
look nice on the report.

You certainly don't have to :)

At the very bottom of the loop, it adds 2 to X, and goes back to the top. 
This time X=3, so AttrName = PartyIdentifierQualifierCode and AttrVal = 
receiverAssigned.

The third time through the loop, X=5, and since attr(5) = *NULL, the loop 
stops.  Finally, it prints the "PrintMe" string to the report:

     except print;

Hopefully you understand at this point that it's my code, not Expat, 
that's causing the element names & values to be printed the way they are 
printed.  If your goal is to print them differently, or do something else 
entirely different with them, you can do that... just write your RPG code 
differently than mine.

Also, make sure you don't use QDCXLATE in your production code.  It's fine 
for a trivial example like this, but for production code, you want to use 
iconv() instead.  There's a sample of this in the XLATEICONV source 
member.

-- 
Scott Klement  http://www.scottklement.com



On Thu, 17 Aug 2006, Michael Ryan wrote:

> Hi Scott -
>
> Thanks for the reply. I meant complex in the sense of multiple values for
> one tag, not in the sense of difficulty.I'm probably using the term
> incorrectly. Yeah, CHARDATA1 is one of the sample programs in the LIBEXPAT
> library. Thanks for taking the time to look into this!
>
> - Michael
>
> On 8/16/06, Scott Klement <sk@xxxxxxxxxxxxxxxx> wrote:
>> 
>> 
>> I don't understand what you mean by a "complex value", the sample you
>> provided is a very simple one, and I don't consider it to be complex at
>> all.
>> 
>> I don't remember what CHARDATA1 does, though it sounds like it's probably
>> something I wrote.  I don't have time to look it over now, but I'll try to
>> do so tomorrow or something like that.
>> 
>> --
>> Scott Klement  http://www.scottklement.com
>> 
>> 
>> On Wed, 16 Aug 2006, Michael Ryan wrote:
>> 
>> > Anyone know if eXpat can parse a complex value? I'm using the eXpat
>> parser
>> > that's included in HTTPAPI, and I have this problem.
>> > I can handle a simple value, like:
>> >
>> > <currency>USD</currency>.
>> >
>> > When I use the CHARDATA1 sample, it returns:
>> >
>> > currency
>> > Char: USD
>> >
>> > Which is what I expect. But when it encounters this:
>> >
>> > <partyIdentifier partyIdentifierCode="account"
>> > partyIdentifierQualifierCode="receiverAssigned"/>
>> >
>> > it returns:
>> >
>> > partyIdentifier partyIdentifierCode="account"
>> > partyIdentifierQualifierCode="receiverAssigned"
>> >
>> > And what I would like is:
>> >
>> > partyIdentifier
>> > partyIdentifierCode
>> > Char: account
>> > partyIdentifierQualifierCode
>> > Char: receiverAssigned
>> >
>> > Obviously I don't care about the Char:, I just want to be able to
>> identify
>> > the subfields (?) of the XML value. How can I do that with eXpat, or do
>> I
>> > parse that on my own?
>> >
>> > Thanks!
>> >
>> -----------------------------------------------------------------------
>> This is the FTPAPI mailing list.  To unsubscribe, please go to:
>> http://www.scottklement.com/mailman/listinfo/ftpapi
>> -----------------------------------------------------------------------
>> 
>
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------