[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Complex XML Value
Hi Scott -
Thanks so much! Just what I needed. I really appreciate you doing this
stuff for the community, and thank you for taking the time to provide
a cogent explanation. I can see exactly what I need to do now.
Thanks again...
- Michael
On 8/17/06, Scott Klement <[1]sk@xxxxxxxxxxxxxxxx> wrote:
Hi Michael,
The purpose of CHARDATA1 was to illustrate how Expat works. It's
not
intended to be a utility that you can use to parse XML. It's
intended to
demonstrate the flow of events in Expat by showing how Expat first
calls
the start handler, then the character data handler, then the end
handler,
repeatedly for the whole XML document.
For example, here's a trivial XML document:
<Blah>
<ReallyBlah>Dum De Dum</ReallyBlah>
</Blah>
Expat will parse the XML, then call your RPG subprocedures in the
following manner:
Start handler w/following parms:
elemName = Blah
attr(1) = *NULL
Chardata handler w/following parms:
String = CRLF + " " (the CRLF after the <Blah> element, and
the
two spaces that start the subsequent
line)
Len = 4.
Start handler:
elemName = "ReallyBlah"
attr(1) = *NULL
Chardata handler:
String = "Dum De Dum"
Len = 10
End handler:
elemName = "ReallyBlah"
Chardata handler
String = CRLF
Len = 2
End handler
elemName = "Blah"
So all Expat does is call YOUR SUBPROCEDURES (I want to emphasize
this,
because it's my RPG subprocedure, not Expat, that formats and
prints the
results). The parameters that it passes are in UTF-8 unicode.
Let's look at your sample XML data:
<partyIdentifier partyIdentifierCode="account"
partyIdentifierQualifierCode="receiverAssigned"/>
This is actually a simpler example than my "Blah" one
above! There's only
one XML elemnt, whereas my sample had two. There's no character
data at
all.
Expat will call your subproceures in this sequence
Start handler
elemName = partyIdentifier
attr(1) = partyIdentifierCode
attr(2) = account
attr(3) = partyIdentifierQualifierCode
attr(4) = receiverAssigned
attr(5) = *NULL
End hanlder:
elemName = partyIdentifier
So your subprocedure gets called with the above parameters. Because
they're C-style strings that are in UTF-8 unicode, you have to do
some
work to get them into RPG style strings that are in EBCDIC -- but
you sure
don't have to parse anything :)
The attr parameter is an array of pointers. It's variable-length
-- you
detect the end of the array by looking for a pointer that's set to
*NULL.
In my "Blah" example, there were no attributes, so the first
element was a
*NULL. In your example, everything is done with attributes...
In the code for the "start" subprocedure in the CHARDATA1 program,
it does
this to convert the element name passed by Expat into an RPG-style
EBCDIC
variable:
elemName = %str(elem);
QDCXLATE( %len(%trimr(elemName))
: elemName
: 'QTCPEBC' );
The first line uses the %str() BIF to extract the C-style string
into an
RPG style string. The second line uses the QDCXLATE API to convert
from
ASCII to EBCDIC. Technically that's wrong, since Expat passes the
data in
UTF-8 unicode, not ASCII, but if there are no special/cultural
characters,
they have the same hex values so it works. I didn't want to start
out my
article having to explain the complexity of the iconv() API for
fear that
the reader would get up and run away screaming... :)
I've since learned that it's possible to output UTF-16 from Expat
instead
of UTF-8. One of these days I'll experiment with that, as it would
be
much easier to deal with in an RPG program -- but I'm going off on
a
tangent now...
After the element name has been extracted to an RPG-style string
and
converted to EBCDIC, it's added to a variable called "PrintMe" so
it can
be printed out. Nothing exciting, just an EVAL statement:
printme = %subst(blanks: 1: depth)
+ %trimr(elemName);
The attribute names are then extracted from the array of
pointers. Here's
the code (from the same subprocedure) that does that:
x = 1;
dow attr(x) <> *NULL;
AttrName = %str(attr(x));
QDCXLATE( %len(%trimr(AttrName))
: AttrName
: 'QTCPEBC' );
AttrVal = %str(attr(x+1));
QDCXLATE( %len(%trimr(AttrVal))
: AttrVal
: 'QTCPEBC' );
printme = printme + ' ' + %trimr(Attrname)
+ '="' + %trimr(AttrVal) + '"';
x = x + 2;
enddo;
Remember, in your example, you'd have this:
attr(1) = partyIdentifierCode
attr(2) = account
attr(3) = partyIdentifierQualifierCode
attr(4) = receiverAssigned
attr(5) = *NULL
First time through the loop. X=1. So when the %str() BIF is called
to get
attr(x), it'll return partyIdentifierCode (in UTF-8) to the RPG
variable
named AttrName. It then converts it to EBCDIC.
Then we do the same thing with X+1, and the value of "account" is
extracted to the variable named AttrVal. By the time you reach the
"PrintME" line, this is what you have:
AttrName = partyIdentifierCode
AttrVal = account
You can do whatever you like with those values. As you can see, you
don't
have to parse them out, you already have them in RPG
variables. Use them
as you see fit.
For my example (and I think this is what's confusing you) I do
this:
printme = printme + ' ' + %trimr(Attrname)
+ '="' + %trimr(AttrVal) + '"';
Remember, "printme" already has the element name (partyIdentifier)
in it.
so now I'm adding "partyIdentifierCode", then "=", then quote, then
"account", then another quote. I'm manually adding them together
and
undoing what Expat parsed! Why did I do that? Because I thought
it'd
look nice on the report.
You certainly don't have to :)
At the very bottom of the loop, it adds 2 to X, and goes back to
the top.
This time X=3, so AttrName = PartyIdentifierQualifierCode and
AttrVal =
receiverAssigned.
The third time through the loop, X=5, and since attr(5) = *NULL,
the loop
stops. Finally, it prints the "PrintMe" string to the report:
except print;
Hopefully you understand at this point that it's my code, not
Expat,
that's causing the element names & values to be printed the way
they are
printed. If your goal is to print them differently, or do
something else
entirely different with them, you can do that... just write your
RPG code
differently than mine.
Also, make sure you don't use QDCXLATE in your production
code. It's fine
for a trivial example like this, but for production code, you want
to use
iconv() instead. There's a sample of this in the XLATEICONV source
member.
--
Scott Klement [2]http://www.scottklement.com
On Thu, 17 Aug 2006, Michael Ryan wrote:
> Hi Scott -
>
> Thanks for the reply. I meant complex in the sense of multiple
values for
> one tag, not in the sense of difficulty.I'm probably using the
term
> incorrectly. Yeah, CHARDATA1 is one of the sample programs in the
LIBEXPAT
> library. Thanks for taking the time to look into this!
>
> - Michael
>
> On 8/16/06, Scott Klement <[3]sk@xxxxxxxxxxxxxxxx> wrote:
>>
>>
>> I don't understand what you mean by a "complex value", the
sample you
>> provided is a very simple one, and I don't consider it to be
complex at
>> all.
>>
>> I don't remember what CHARDATA1 does, though it sounds like it's
probably
>> something I wrote. I don't have time to look it over now, but
I'll try to
>> do so tomorrow or something like that.
>>
>> --
>> Scott Klement [4]http://www.scottklement.com
>>
>>
>> On Wed, 16 Aug 2006, Michael Ryan wrote:
>>
>> > Anyone know if eXpat can parse a complex value? I'm using the
eXpat
>> parser
>> > that's included in HTTPAPI, and I have this problem.
>> > I can handle a simple value, like:
>> >
>> > <currency>USD</currency>.
>> >
>> > When I use the CHARDATA1 sample, it returns:
>> >
>> > currency
>> > Char: USD
>> >
>> > Which is what I expect. But when it encounters this:
>> >
>> > <partyIdentifier partyIdentifierCode="account"
>> > partyIdentifierQualifierCode="receiverAssigned"/>
>> >
>> > it returns:
>> >
>> > partyIdentifier partyIdentifierCode="account"
>> > partyIdentifierQualifierCode="receiverAssigned"
>> >
>> > And what I would like is:
>> >
>> > partyIdentifier
>> > partyIdentifierCode
>> > Char: account
>> > partyIdentifierQualifierCode
>> > Char: receiverAssigned
>> >
>> > Obviously I don't care about the Char:, I just want to be able
to
>> identify
>> > the subfields (?) of the XML value. How can I do that with
eXpat, or do
>> I
>> > parse that on my own?
>> >
>> > Thanks!
>> >
>>
-------------------------------------------------------------------
----
>> This is the FTPAPI mailing list. To unsubscribe, please go to:
>> [5]http://www.scottklement.com/mailman/listinfo/ftpapi
>>
-------------------------------------------------------------------
----
>>
>
-------------------------------------------------------------------
----
This is the FTPAPI mailing list. To unsubscribe, please go to:
[6]http://www.scottklement.com/mailman/listinfo/ftpapi
-------------------------------------------------------------------
----
References
1. mailto:sk@xxxxxxxxxxxxxxxx
2. http://www.scottklement.com/
3. mailto:sk@xxxxxxxxxxxxxxxx
4. http://www.scottklement.com/
5. http://www.scottklement.com/mailman/listinfo/ftpapi
6. http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------