[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Character conversion



Hi Scott,

Cut down to the bare basics, you can try this and see if you come to the same conclusions about the iconv function as I have:

H DFTACTGRP(*NO) ACTGRP(*NEW) BNDDIR('HTTPAPI') DEBUG(*YES)     
                                                                
D/copy libhttp/qrpglesrc,httpapi_h                              
D Buffer          S             40A                             
D size            S             10I 0                           
D rc              S             10I 0                           
                                                                
 /Free                                                          
                                                                
  http_debug(*ON); // Use default debug file...                 
                                                                
  // Use UTF-8 character set - set up conversion from EBCDIC... 
                                                                
  rc = http_SetCCSIDs(1208 : 277);                              
  buffer = 'XYZÆØÅxyzæøå';                                      
  size = 20;                                                    
  dump;                                                         
  rc = http_xlate(size : buffer : TO_ASCII);                    
  if rc = 0;                                                    
    dump;                                                       
  endif;                                                        
  *inlr = *on;                                                  


I enclose the compiler list, as well as the before and after dumps produced by the program.

Best regards,
Kaj

-----Original Message-----
From: ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx [mailto:ftpapi-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Scott Klement
Sent: Monday, November 15, 2010 6:35 PM
To: HTTPAPI and FTPAPI Projects
Subject: Re: Character conversion

Hello,

HTTPAPI uses several different translation routines.  HTTP_xlate(), 
HTTP_xlatep() and HTTP_xlatedyn().  There are pros and cons to each 
routine...

Not sure exactly which part of HTTPAPI you're having trouble with, or 
whether you're calling http_xlate() directly?

Can you provide more information?  Preferably some sample code that I 
can use to reproduce the problem?


On 11/15/2010 3:32 AM, Kaj Julius wrote:
>
>     Hi all,
>
>
>     I think I'm going crazy, so please help if you can. Somewhere I must
>     have taken a wrong turn. When I started out it seemed so easy, but
>     apparently not!
>
>
>     I'm trying to use http_SetCCSIDs(1208:277) and http_xlate() to
>     translate the special Danish characters æøåÆØÅ into their UTF-8
>     counterparts. The UTF-8 character set is used a lot in web
>     development, so I'm baffled at my findings.
>
>
>     CCSID 277 -->  1208
>
>
>     In CCSID 277 (Danish/Norwegian) the string 'æøåÆØÅ' is represented by
>     X' 7B7C5BC06AD0'
>
>
>     In CCSID 1208 (UTF-8) the same string is represented by
>     X'C386C398C385C3A6C3B8C3A5'
>
>     (notice that each character needs two bytes -- UTF-8 characters will
>     be using anywhere between one and four bytes)
>
>
>     [1]http://www.utf8-chartable.de/
>
>     [2]http://czyborra.com/utf/#UTF-8
>
>
>     What I get after running http_xlate (iconv translation), however, is
>     X' C6D8C5E6F8E5'
>
>
>     The procedures are executed without any apparent errors issued.
>     However, when I look at the converted data, it's all wrong. As far as
>     I can deduce what is in the buffer after the call equals the rightmost
>     byte of the two byte UTF-16 character set.
>
>
>     Incidentally, I get the exact same result when I specify CCSID 1200
>     (UTF-16) as the target. There I would have expected a returned buffer
>     of double the length of the input, since every character now use 16
>     bits (hence the name, I guess). This is just wrong!
>     It should have been X'00C600D800C500E600F800E5'
>
>
>     Is this behaviour normal for iconv conversions?
>
>
>     I'm on an old system, V5R3M0, is this at the root of the problem?
>
>
>
>     Okay, second problem:
>
>
>     Looking at the code in procedure CCSIDxlate() I notice that the code
>     doesn't allow for the output buffer to be of a different length than
>     the input buffer (which can be the case in conversions to/from
>     single-byte CCSIDs and mixed-length UTF-8 and definitely will be the
>     case in conversions to/from single-byte CCSIDs and UTF-16 or other
>     double-byte CCSIDs. The same buffer is used for both input and output
>     -- and the length of the converted characters isn't communicated back
>     to the caller.
>
>
>     Assuming that the above mentioned problem with the iconv conversion
>     isn't the norm (ie. is a problem on my system), shouldn't the
>     CCSIDxlate() procedure have used separate input and output buffers and
>     have returned the length of the converted characters in the buffer?
>
>
>     I'm using HTTPAPI 1.24beta11 from 2010-09-09
>
>
>     I look forward to your input in eager anticipation!
>
>
>
>     TIA
>
>
>     Kaj
>
> References
>
>     1. http://www.utf8-chartable.de/
>     2. http://czyborra.com/utf/#UTF-8
>
>
>
>
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------

-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------

Attachment: ICONVTEST.pdf
Description: ICONVTEST.pdf

Attachment: AFTER.pdf
Description: AFTER.pdf

Attachment: BEFORE.pdf
Description: BEFORE.pdf

HTTPAPI Ver 1.24beta11 released 2010-09-09
OS/400 Ver V5R3M0

New iconv() objects set, PostRem=1208. PostLoc=277. ProtRem=819. ProtLoc=0
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------