[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character conversion



Hello,

HTTPAPI uses several different translation routines.  HTTP_xlate(), 
HTTP_xlatep() and HTTP_xlatedyn().  There are pros and cons to each 
routine...

Not sure exactly which part of HTTPAPI you're having trouble with, or 
whether you're calling http_xlate() directly?

Can you provide more information?  Preferably some sample code that I 
can use to reproduce the problem?


On 11/15/2010 3:32 AM, Kaj Julius wrote:
>
>     Hi all,
>
>
>     I think I'm going crazy, so please help if you can. Somewhere I must
>     have taken a wrong turn. When I started out it seemed so easy, but
>     apparently not!
>
>
>     I'm trying to use http_SetCCSIDs(1208:277) and http_xlate() to
>     translate the special Danish characters æøåÆØÅ into their UTF-8
>     counterparts. The UTF-8 character set is used a lot in web
>     development, so I'm baffled at my findings.
>
>
>     CCSID 277 -->  1208
>
>
>     In CCSID 277 (Danish/Norwegian) the string 'æøåÆØÅ' is represented by
>     X' 7B7C5BC06AD0'
>
>
>     In CCSID 1208 (UTF-8) the same string is represented by
>     X'C386C398C385C3A6C3B8C3A5'
>
>     (notice that each character needs two bytes -- UTF-8 characters will
>     be using anywhere between one and four bytes)
>
>
>     [1]http://www.utf8-chartable.de/
>
>     [2]http://czyborra.com/utf/#UTF-8
>
>
>     What I get after running http_xlate (iconv translation), however, is
>     X' C6D8C5E6F8E5'
>
>
>     The procedures are executed without any apparent errors issued.
>     However, when I look at the converted data, it's all wrong. As far as
>     I can deduce what is in the buffer after the call equals the rightmost
>     byte of the two byte UTF-16 character set.
>
>
>     Incidentally, I get the exact same result when I specify CCSID 1200
>     (UTF-16) as the target. There I would have expected a returned buffer
>     of double the length of the input, since every character now use 16
>     bits (hence the name, I guess). This is just wrong!
>     It should have been X'00C600D800C500E600F800E5'
>
>
>     Is this behaviour normal for iconv conversions?
>
>
>     I'm on an old system, V5R3M0, is this at the root of the problem?
>
>
>
>     Okay, second problem:
>
>
>     Looking at the code in procedure CCSIDxlate() I notice that the code
>     doesn't allow for the output buffer to be of a different length than
>     the input buffer (which can be the case in conversions to/from
>     single-byte CCSIDs and mixed-length UTF-8 and definitely will be the
>     case in conversions to/from single-byte CCSIDs and UTF-16 or other
>     double-byte CCSIDs. The same buffer is used for both input and output
>     -- and the length of the converted characters isn't communicated back
>     to the caller.
>
>
>     Assuming that the above mentioned problem with the iconv conversion
>     isn't the norm (ie. is a problem on my system), shouldn't the
>     CCSIDxlate() procedure have used separate input and output buffers and
>     have returned the length of the converted characters in the buffer?
>
>
>     I'm using HTTPAPI 1.24beta11 from 2010-09-09
>
>
>     I look forward to your input in eager anticipation!
>
>
>
>     TIA
>
>
>     Kaj
>
> References
>
>     1. http://www.utf8-chartable.de/
>     2. http://czyborra.com/utf/#UTF-8
>
>
>
>
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------

-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------