[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Character conversion



   Hi all,


   I think I'm going crazy, so please help if you can. Somewhere I must
   have taken a wrong turn. When I started out it seemed so easy, but
   apparently not!


   I'm trying to use http_SetCCSIDs(1208:277) and http_xlate() to
   translate the special Danish characters æøåÆØÅ into their UTF-8
   counterparts. The UTF-8 character set is used a lot in web
   development, so I'm baffled at my findings.


   CCSID 277 --> 1208


   In CCSID 277 (Danish/Norwegian) the string 'æøåÆØÅ' is represented by
   X' 7B7C5BC06AD0'


   In CCSID 1208 (UTF-8) the same string is represented by
   X'C386C398C385C3A6C3B8C3A5'

   (notice that each character needs two bytes -- UTF-8 characters will
   be using anywhere between one and four bytes)


   [1]http://www.utf8-chartable.de/

   [2]http://czyborra.com/utf/#UTF-8


   What I get after running http_xlate (iconv translation), however, is
   X' C6D8C5E6F8E5'


   The procedures are executed without any apparent errors issued.
   However, when I look at the converted data, it's all wrong. As far as
   I can deduce what is in the buffer after the call equals the rightmost
   byte of the two byte UTF-16 character set.


   Incidentally, I get the exact same result when I specify CCSID 1200
   (UTF-16) as the target. There I would have expected a returned buffer
   of double the length of the input, since every character now use 16
   bits (hence the name, I guess). This is just wrong!
   It should have been X'00C600D800C500E600F800E5'


   Is this behaviour normal for iconv conversions?


   I'm on an old system, V5R3M0, is this at the root of the problem?



   Okay, second problem:


   Looking at the code in procedure CCSIDxlate() I notice that the code
   doesn't allow for the output buffer to be of a different length than
   the input buffer (which can be the case in conversions to/from
   single-byte CCSIDs and mixed-length UTF-8 and definitely will be the
   case in conversions to/from single-byte CCSIDs and UTF-16 or other
   double-byte CCSIDs. The same buffer is used for both input and output
   -- and the length of the converted characters isn't communicated back
   to the caller.


   Assuming that the above mentioned problem with the iconv conversion
   isn't the norm (ie. is a problem on my system), shouldn't the
   CCSIDxlate() procedure have used separate input and output buffers and
   have returned the length of the converted characters in the buffer?


   I'm using HTTPAPI 1.24beta11 from 2010-09-09


   I look forward to your input in eager anticipation!



   TIA


   Kaj

References

   1. http://www.utf8-chartable.de/
   2. http://czyborra.com/utf/#UTF-8
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------