[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: Character conversion



Hi Dennis,

Thanks for answering.

I don't think the first issue is related to the second one. In fact I find the first issue to be much more of a problem than the second one, which I can easily circumvent by changing the code of a few procedures (it's open source, after all). But the first one seems to be an error in the IBM supplied iconv function! Can anybody corroborate this?
Until I can make the iconv function actually handle conversion to UTF-8 / UTF-16 I don't see why I should waste time on changing code, however.

Regards,
Kaj


-----Oprindelig meddelelse-----
Fra: ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx [mailto:ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx] På vegne af Dennis Lovelady
Sendt: 15. november 2010 15:06
Til: 'HTTPAPI and FTPAPI Projects'
Emne: RE: Character conversion

Well, before you sign in to that insane asylum, I concur that your second
point looks like a problem, and may (?) be the source of the first issue.
While I haven't used HTTPAPI, and haven't seen the issue you mention, I did
see this as a potential issue in FTPAPI, and am working to correct it.  Very
likely the routines look very similar to one another.  (Or did.)

Dennis Lovelady
http://www.linkedin.com/in/dennislovelady
--
"I don't want to achieve immortality through my work, I want to achieve it
through not dying."
        -- Woody Allen 


> I think I'm going crazy, so please help if you can. Somewhere I must
> have taken a wrong turn. When I started out it seemed so easy, but
> apparently not!
> 
> 
> 
> I'm trying to use http_SetCCSIDs(1208:277) and http_xlate() to
> translate the special Danish characters æøåÆØÅ into their UTF-8
> counterparts. The UTF-8 character set is used a lot in web development,
> so I'm baffled at my findings.
> 
> 
> 
> CCSID 277 --> 1208
> 
> 
> 
> In CCSID 277 (Danish/Norwegian) the string 'æøåÆØÅ' is represented by
> X' 7B7C5BC06AD0'
> 
> 
> 
> In CCSID 1208 (UTF-8) the same string is represented by
> X'C386C398C385C3A6C3B8C3A5'
> 
> (notice that each character needs two bytes -- UTF-8 characters will be
> using anywhere between one and four bytes)
> 
> 
> 
> http://www.utf8-chartable.de/ <http://www.utf8-chartable.de/>
> 
> http://czyborra.com/utf/#UTF-8
> 
> 
> 
> What I get after running http_xlate (iconv translation), however, is X'
> C6D8C5E6F8E5'
> 
> 
> 
> The procedures are executed without any apparent errors issued.
> However, when I look at the converted data, it's all wrong. As far as I
> can deduce what is in the buffer after the call equals the rightmost
> byte of the two byte UTF-16 character set.
> 
> 
> 
> Incidentally, I get the exact same result when I specify CCSID 1200
> (UTF-16) as the target. There I would have expected a returned buffer
> of double the length of the input, since every character now use 16
> bits (hence the name, I guess). This is just wrong!
> It should have been X'00C600D800C500E600F800E5'
> 
> 
> 
> Is this behaviour normal for iconv conversions?
> 
> 
> 
> I'm on an old system, V5R3M0, is this at the root of the problem?
> 
> 
> 
> 
> 
> Okay, second problem:
> 
> 
> 
> Looking at the code in procedure CCSIDxlate() I notice that the code
> doesn't allow for the output buffer to be of a different length than
> the input buffer (which can be the case in conversions to/from single-
> byte CCSIDs and mixed-length UTF-8 and definitely will be the case in
> conversions to/from single-byte CCSIDs and UTF-16 or other double-byte
> CCSIDs. The same buffer is used for both input and output -- and the
> length of the converted characters isn't communicated back to the
> caller.
> 
> 
> 
> Assuming that the above mentioned problem with the iconv conversion
> isn't the norm (ie. is a problem on my system), shouldn't the
> CCSIDxlate() procedure have used separate input and output buffers and
> have returned the length of the converted characters in the buffer?
> 
> 
> 
> I'm using HTTPAPI 1.24beta11 from 2010-09-09
> 
> 
> 
> I look forward to your input in eager anticipation!
> 
> 
> 
> 
> 
> TIA
> 
> 
> 
> Kaj
> 
> 


-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------