[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character conversion
Hello,
HTTPAPI uses several different translation routines. HTTP_xlate(),
HTTP_xlatep() and HTTP_xlatedyn(). There are pros and cons to each
routine...
Not sure exactly which part of HTTPAPI you're having trouble with, or
whether you're calling http_xlate() directly?
Can you provide more information? Preferably some sample code that I
can use to reproduce the problem?
On 11/15/2010 3:32 AM, Kaj Julius wrote:
>
> Hi all,
>
>
> I think I'm going crazy, so please help if you can. Somewhere I must
> have taken a wrong turn. When I started out it seemed so easy, but
> apparently not!
>
>
> I'm trying to use http_SetCCSIDs(1208:277) and http_xlate() to
> translate the special Danish characters æøåÆØÅ into their UTF-8
> counterparts. The UTF-8 character set is used a lot in web
> development, so I'm baffled at my findings.
>
>
> CCSID 277 --> 1208
>
>
> In CCSID 277 (Danish/Norwegian) the string 'æøåÆØÅ' is represented by
> X' 7B7C5BC06AD0'
>
>
> In CCSID 1208 (UTF-8) the same string is represented by
> X'C386C398C385C3A6C3B8C3A5'
>
> (notice that each character needs two bytes -- UTF-8 characters will
> be using anywhere between one and four bytes)
>
>
> [1]http://www.utf8-chartable.de/
>
> [2]http://czyborra.com/utf/#UTF-8
>
>
> What I get after running http_xlate (iconv translation), however, is
> X' C6D8C5E6F8E5'
>
>
> The procedures are executed without any apparent errors issued.
> However, when I look at the converted data, it's all wrong. As far as
> I can deduce what is in the buffer after the call equals the rightmost
> byte of the two byte UTF-16 character set.
>
>
> Incidentally, I get the exact same result when I specify CCSID 1200
> (UTF-16) as the target. There I would have expected a returned buffer
> of double the length of the input, since every character now use 16
> bits (hence the name, I guess). This is just wrong!
> It should have been X'00C600D800C500E600F800E5'
>
>
> Is this behaviour normal for iconv conversions?
>
>
> I'm on an old system, V5R3M0, is this at the root of the problem?
>
>
>
> Okay, second problem:
>
>
> Looking at the code in procedure CCSIDxlate() I notice that the code
> doesn't allow for the output buffer to be of a different length than
> the input buffer (which can be the case in conversions to/from
> single-byte CCSIDs and mixed-length UTF-8 and definitely will be the
> case in conversions to/from single-byte CCSIDs and UTF-16 or other
> double-byte CCSIDs. The same buffer is used for both input and output
> -- and the length of the converted characters isn't communicated back
> to the caller.
>
>
> Assuming that the above mentioned problem with the iconv conversion
> isn't the norm (ie. is a problem on my system), shouldn't the
> CCSIDxlate() procedure have used separate input and output buffers and
> have returned the length of the converted characters in the buffer?
>
>
> I'm using HTTPAPI 1.24beta11 from 2010-09-09
>
>
> I look forward to your input in eager anticipation!
>
>
>
> TIA
>
>
> Kaj
>
> References
>
> 1. http://www.utf8-chartable.de/
> 2. http://czyborra.com/utf/#UTF-8
>
>
>
>
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list. To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------