Wireshark-dev: Re: [Wireshark-dev] Replacing g_iconv and different codesets
From: Jakub Zawadzki <darkjames-ws@xxxxxxxxxxxx>
Date: Fri, 20 Dec 2013 20:24:04 +0100
On Fri, Dec 20, 2013 at 10:46:29AM -0800, Michael Lum wrote:
> Is there a goal to remove g_iconv calls from Wireshark.

Nope, it's not a goal (at least not for me).

Goals are two:
 1/ To support more encodings in epan, which will make it easier for people to use
 2/ Thanks to 1/ more calls can be changed to proto_tree_add_item() which will not convert text,
    if it's not required (which should save some CPU cycles).

> The other is EUC-KR (Korean).  I tried to find a code page that looks like the ISO ones but I'm not how these
> conversions are supposed to work.

glibc iconv is licenced under LGPL-2 so you could use it source.

In euc-kr [1] you can see that it's using ksc5601_to_ucs4() which can be find in ksc5601.h [2].
ksc5601_to_ucs4() is using convertation tables: __ksc5601_hangul_to_ucs, __ksc5601_hanja_to_ucs, __ksc5601_sym_to_ucs
from ksc5601.c [3].

These tables are quite big (about 1K lines) and __ksc5601_sym_to_ucs is using C99 array index initialization
(which you would need to expand (covert to switch?) before commiting).


So I'd rather suggest still using g_iconv() for EUC-KR, but feel free to introduce new ENC_EUC_KR and move it to core.


Cheers,
Kuba.

[1] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=iconvdata/euc-kr.c;hb=HEAD
[2] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=iconvdata/ksc5601.h;hb=HEAD
[3] https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=iconvdata/ksc5601.c;hb=HEAD