[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex XML Value



   Hi Scott -
   Thanks so much! Just what I needed. I really appreciate you doing this
   stuff for the community, and thank you for taking the time to provide
   a cogent explanation. I can see exactly what I need to do now.
   Thanks again...
   - Michael

   On 8/17/06, Scott Klement <[1]sk@xxxxxxxxxxxxxxxx> wrote:

     Hi Michael,
     The purpose of CHARDATA1 was to illustrate how Expat works.  It's
     not
     intended to be a utility that you can use to parse XML.  It's
     intended to
     demonstrate the flow of events in Expat by showing how Expat first
     calls
     the start handler, then the character data handler, then the end
     handler,
     repeatedly for the whole XML document.
     For example, here's a trivial XML document:
          <Blah>
             <ReallyBlah>Dum De Dum</ReallyBlah>
          </Blah>
     Expat will parse the XML, then call your RPG subprocedures in the
     following manner:
     Start handler w/following parms:
         elemName = Blah
         attr(1) = *NULL
     Chardata handler w/following parms:
         String = CRLF + "  "  (the CRLF after the <Blah> element, and
     the
                                two spaces that start the subsequent
     line)
         Len = 4.
     Start handler:
         elemName = "ReallyBlah"
         attr(1) = *NULL
     Chardata handler:
          String = "Dum De Dum"
          Len = 10
     End handler:
          elemName = "ReallyBlah"
     Chardata handler
          String = CRLF
          Len = 2
     End handler
          elemName = "Blah"
     So all Expat does is call YOUR SUBPROCEDURES (I want to emphasize
     this,
     because it's my RPG subprocedure, not Expat, that formats and
     prints the
     results).  The parameters that it passes are in UTF-8 unicode.
     Let's look at your sample XML data:
        <partyIdentifier partyIdentifierCode="account"
            partyIdentifierQualifierCode="receiverAssigned"/>
     This is actually a simpler example than my "Blah" one
     above!  There's only
     one XML elemnt, whereas my sample had two.  There's no character
     data at
     all.
     Expat will call your subproceures in this sequence
     Start handler
         elemName = partyIdentifier
         attr(1) = partyIdentifierCode
         attr(2) = account
         attr(3) = partyIdentifierQualifierCode
         attr(4) = receiverAssigned
         attr(5) = *NULL
     End hanlder:
         elemName = partyIdentifier
     So your subprocedure gets called with the above parameters. Because
     they're C-style strings that are in UTF-8 unicode, you have to do
     some
     work to get them into RPG style strings that are in EBCDIC -- but
     you sure
     don't have to parse anything :)
     The attr parameter is an array of pointers.  It's variable-length
     -- you
     detect the end of the array by looking for a pointer that's set to
     *NULL.
     In my "Blah" example, there were no attributes, so the first
     element was a
     *NULL.  In your example, everything is done with attributes...
     In the code for the "start" subprocedure in the CHARDATA1 program,
     it does
     this to convert the element name passed by Expat into an RPG-style
     EBCDIC
     variable:
         elemName = %str(elem);
         QDCXLATE( %len(%trimr(elemName))
                 : elemName
                 : 'QTCPEBC' );
     The first line uses the %str() BIF to extract the C-style string
     into an
     RPG style string.  The second line uses the QDCXLATE API to convert
     from
     ASCII to EBCDIC.  Technically that's wrong, since Expat passes the
     data in
     UTF-8 unicode, not ASCII, but if there are no special/cultural
     characters,
     they have the same hex values so it works.   I didn't want to start
     out my
     article having to explain the complexity of the iconv() API for
     fear that
     the reader would get up and run away screaming... :)
     I've since learned that it's possible to output UTF-16 from Expat
     instead
     of UTF-8.  One of these days I'll experiment with that, as it would
     be
     much easier to deal with in an RPG program -- but I'm going off on
     a
     tangent now...
     After the element name has been extracted to an RPG-style string
     and
     converted to EBCDIC, it's added to a variable called "PrintMe" so
     it can
     be printed out.  Nothing exciting, just an EVAL statement:
           printme = %subst(blanks: 1: depth)
                   + %trimr(elemName);
     The attribute names are then extracted from the array of
     pointers.  Here's
     the code (from the same subprocedure) that does that:
          x = 1;
          dow attr(x) <> *NULL;
             AttrName = %str(attr(x));
             QDCXLATE( %len(%trimr(AttrName))
                     : AttrName
                     : 'QTCPEBC' );
             AttrVal  = %str(attr(x+1));
             QDCXLATE( %len(%trimr(AttrVal))
                     : AttrVal
                     : 'QTCPEBC' );
             printme = printme + ' ' + %trimr(Attrname)
                               + '="' + %trimr(AttrVal) + '"';
             x = x + 2;
          enddo;
     Remember, in your example, you'd have this:
         attr(1) = partyIdentifierCode
         attr(2) = account
         attr(3) = partyIdentifierQualifierCode
         attr(4) = receiverAssigned
         attr(5) = *NULL
     First time through the loop. X=1.  So when the %str() BIF is called
     to get
     attr(x), it'll return partyIdentifierCode (in UTF-8) to the RPG
     variable
     named AttrName.  It then converts it to EBCDIC.
     Then we do the same thing with X+1, and the value of "account" is
     extracted to the variable named AttrVal.  By the time you reach the
     "PrintME" line, this is what you have:
          AttrName = partyIdentifierCode
          AttrVal  = account
     You can do whatever you like with those values. As you can see, you
     don't
     have to parse them out, you already have them in RPG
     variables.  Use them
     as you see fit.
     For my example (and I think this is what's confusing you) I do
     this:
             printme = printme + ' ' + %trimr(Attrname)
                               + '="' + %trimr(AttrVal) + '"';
     Remember, "printme" already has the element name (partyIdentifier)
     in it.
     so now I'm adding "partyIdentifierCode", then "=", then quote, then
     "account", then another quote.  I'm manually adding them together
     and
     undoing what Expat parsed!  Why did I do that?  Because I thought
     it'd
     look nice on the report.
     You certainly don't have to :)
     At the very bottom of the loop, it adds 2 to X, and goes back to
     the top.
     This time X=3, so AttrName = PartyIdentifierQualifierCode and
     AttrVal =
     receiverAssigned.
     The third time through the loop, X=5, and since attr(5) = *NULL,
     the loop
     stops.  Finally, it prints the "PrintMe" string to the report:
          except print;
     Hopefully you understand at this point that it's my code, not
     Expat,
     that's causing the element names & values to be printed the way
     they are
     printed.  If your goal is to print them differently, or do
     something else
     entirely different with them, you can do that... just write your
     RPG code
     differently than mine.
     Also, make sure you don't use QDCXLATE in your production
     code.  It's fine
     for a trivial example like this, but for production code, you want
     to use
     iconv() instead.  There's a sample of this in the XLATEICONV source
     member.
     --
     Scott Klement  [2]http://www.scottklement.com
     On Thu, 17 Aug 2006, Michael Ryan wrote:
     > Hi Scott -
     >
     > Thanks for the reply. I meant complex in the sense of multiple
     values for
     > one tag, not in the sense of difficulty.I'm probably using the
     term
     > incorrectly. Yeah, CHARDATA1 is one of the sample programs in the
     LIBEXPAT
     > library. Thanks for taking the time to look into this!
     >
     > - Michael
     >
     > On 8/16/06, Scott Klement <[3]sk@xxxxxxxxxxxxxxxx> wrote:
     >>
     >>
     >> I don't understand what you mean by a "complex value", the
     sample you
     >> provided is a very simple one, and I don't consider it to be
     complex at
     >> all.
     >>
     >> I don't remember what CHARDATA1 does, though it sounds like it's
     probably
     >> something I wrote.  I don't have time to look it over now, but
     I'll try to
     >> do so tomorrow or something like that.
     >>
     >> --
     >> Scott Klement  [4]http://www.scottklement.com
     >>
     >>
     >> On Wed, 16 Aug 2006, Michael Ryan wrote:
     >>
     >> > Anyone know if eXpat can parse a complex value? I'm using the
     eXpat
     >> parser
     >> > that's included in HTTPAPI, and I have this problem.
     >> > I can handle a simple value, like:
     >> >
     >> > <currency>USD</currency>.
     >> >
     >> > When I use the CHARDATA1 sample, it returns:
     >> >
     >> > currency
     >> > Char: USD
     >> >
     >> > Which is what I expect. But when it encounters this:
     >> >
     >> > <partyIdentifier partyIdentifierCode="account"
     >> > partyIdentifierQualifierCode="receiverAssigned"/>
     >> >
     >> > it returns:
     >> >
     >> > partyIdentifier partyIdentifierCode="account"
     >> > partyIdentifierQualifierCode="receiverAssigned"
     >> >
     >> > And what I would like is:
     >> >
     >> > partyIdentifier
     >> > partyIdentifierCode
     >> > Char: account
     >> > partyIdentifierQualifierCode
     >> > Char: receiverAssigned
     >> >
     >> > Obviously I don't care about the Char:, I just want to be able
     to
     >> identify
     >> > the subfields (?) of the XML value. How can I do that with
     eXpat, or do
     >> I
     >> > parse that on my own?
     >> >
     >> > Thanks!
     >> >
     >>
     -------------------------------------------------------------------
     ----
     >> This is the FTPAPI mailing list.  To unsubscribe, please go to:
     >> [5]http://www.scottklement.com/mailman/listinfo/ftpapi
     >>
     -------------------------------------------------------------------
     ----
     >>
     >
     -------------------------------------------------------------------
     ----
     This is the FTPAPI mailing list.  To unsubscribe, please go to:
     [6]http://www.scottklement.com/mailman/listinfo/ftpapi 
     -------------------------------------------------------------------
     ----

References

   1. mailto:sk@xxxxxxxxxxxxxxxx
   2. http://www.scottklement.com/
   3. mailto:sk@xxxxxxxxxxxxxxxx
   4. http://www.scottklement.com/
   5. http://www.scottklement.com/mailman/listinfo/ftpapi
   6. http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------