[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Character conversion



Hi, Kaj:

I have not located any warning stating that iconv() buffers should not
overlap, but that doesn't mean it's OK for them to do so.  I don't know
about Swedish character sets, but it sounds like you are saying that the
input is SBCS and the output is to be DBCS.  If that's the case, I would
expect (without any documentation saying so) that the function would work by
doing character-at-a-time translation from the source, placing result
directly into output buffer, then repeating until done.  (the "number of
bytes left in input/output" fields would seem to support that theory.)

Now *if* that above is true, your first character translation would write to
bytes 1 and 2 of the output (which is also input), and the next character
would be considered.  Since that has been overwritten by the translation,
all bets are off what will follow.

The caveats here are that I'm making a supposition about how this might be
implemented, and that I'm reading this as a SBCS -> DBCS translation.  Of
course anything -> SBCS should work fine.

Of course, the iconv() routine has the opportunity to determine if areas
overlap, and to take alternate actions to accommodate, so this may all be
moot.  But I *do* know that it's a problem to not know how many bytes
iconv() has placed into the result.  (Under the covers, the MI operations
for memcpy are smart emough to do left-to-right normally, but right-to-left
or work-area-based when areas overlap.)

Dennis Lovelady
http://www.linkedin.com/in/dennislovelady
--
"I either want less corruption, or more chance to participate in it."
        -- Ashleigh Brilliant 


> Thanks for answering.
> 
> I don't think the first issue is related to the second one. In fact I
> find the first issue to be much more of a problem than the second one,
> which I can easily circumvent by changing the code of a few procedures
> (it's open source, after all). But the first one seems to be an error
> in the IBM supplied iconv function! Can anybody corroborate this?
> Until I can make the iconv function actually handle conversion to UTF-8
> / UTF-16 I don't see why I should waste time on changing code, however.
> 
> Regards,
> Kaj
> 
> 
> -----Oprindelig meddelelse-----
> Fra: ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx [mailto:ftpapi-
> bounces@xxxxxxxxxxxxxxxxxxxxxx] På vegne af Dennis Lovelady
> Sendt: 15. november 2010 15:06
> Til: 'HTTPAPI and FTPAPI Projects'
> Emne: RE: Character conversion
> 
> Well, before you sign in to that insane asylum, I concur that your
> second
> point looks like a problem, and may (?) be the source of the first
> issue.
> While I haven't used HTTPAPI, and haven't seen the issue you mention, I
> did
> see this as a potential issue in FTPAPI, and am working to correct it.
> Very
> likely the routines look very similar to one another.  (Or did.)
> 
> Dennis Lovelady
> http://www.linkedin.com/in/dennislovelady
> --
> "I don't want to achieve immortality through my work, I want to achieve
> it
> through not dying."
>         -- Woody Allen
> 
> 
> > I think I'm going crazy, so please help if you can. Somewhere I must
> > have taken a wrong turn. When I started out it seemed so easy, but
> > apparently not!
> >
> >
> >
> > I'm trying to use http_SetCCSIDs(1208:277) and http_xlate() to
> > translate the special Danish characters æøåÆØÅ into their UTF-8
> > counterparts. The UTF-8 character set is used a lot in web
> development,
> > so I'm baffled at my findings.
> >
> >
> >
> > CCSID 277 --> 1208
> >
> >
> >
> > In CCSID 277 (Danish/Norwegian) the string 'æøåÆØÅ' is represented by
> > X' 7B7C5BC06AD0'
> >
> >
> >
> > In CCSID 1208 (UTF-8) the same string is represented by
> > X'C386C398C385C3A6C3B8C3A5'
> >
> > (notice that each character needs two bytes -- UTF-8 characters will
> be
> > using anywhere between one and four bytes)
> >
> >
> >
> > http://www.utf8-chartable.de/ <http://www.utf8-chartable.de/>
> >
> > http://czyborra.com/utf/#UTF-8
> >
> >
> >
> > What I get after running http_xlate (iconv translation), however, is
> X'
> > C6D8C5E6F8E5'
> >
> >
> >
> > The procedures are executed without any apparent errors issued.
> > However, when I look at the converted data, it's all wrong. As far as
> I
> > can deduce what is in the buffer after the call equals the rightmost
> > byte of the two byte UTF-16 character set.
> >
> >
> >
> > Incidentally, I get the exact same result when I specify CCSID 1200
> > (UTF-16) as the target. There I would have expected a returned buffer
> > of double the length of the input, since every character now use 16
> > bits (hence the name, I guess). This is just wrong!
> > It should have been X'00C600D800C500E600F800E5'
> >
> >
> >
> > Is this behaviour normal for iconv conversions?
> >
> >
> >
> > I'm on an old system, V5R3M0, is this at the root of the problem?
> >
> >
> >
> >
> >
> > Okay, second problem:
> >
> >
> >
> > Looking at the code in procedure CCSIDxlate() I notice that the code
> > doesn't allow for the output buffer to be of a different length than
> > the input buffer (which can be the case in conversions to/from
> single-
> > byte CCSIDs and mixed-length UTF-8 and definitely will be the case in
> > conversions to/from single-byte CCSIDs and UTF-16 or other double-
> byte
> > CCSIDs. The same buffer is used for both input and output -- and the
> > length of the converted characters isn't communicated back to the
> > caller.
> >
> >
> >
> > Assuming that the above mentioned problem with the iconv conversion
> > isn't the norm (ie. is a problem on my system), shouldn't the
> > CCSIDxlate() procedure have used separate input and output buffers
> and
> > have returned the length of the converted characters in the buffer?
> >
> >
> >
> > I'm using HTTPAPI 1.24beta11 from 2010-09-09
> >
> >
> >
> > I look forward to your input in eager anticipation!
> >
> >
> >
> >
> >
> > TIA
> >
> >
> >
> > Kaj
> >
> >
> 
> 
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------

-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------