Ethereal-dev: Re: [Ethereal-dev] Unicode strings ...

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Richard Sharpe <sharpe@xxxxxxxxxx>
Date: Mon, 13 Aug 2001 10:40:34 +0930
Guy Harris wrote:
On Sun, Aug 12, 2001 at 05:12:34PM -0700, Guy Harris wrote:

My inclination would be to go with UTF-8 internally,


UTF-8 has a number of advantages, and we can use iconv for conversion, but that requires another library :-(

...although that would require either that "proto_tree_add_string()" and so on take an argument specifying the character set to be used or that some indication of the character set be associated with the field.

The latter has the disadvantage that not all occurrences of the field
would necessarily have the same character set, so I'd be inclined to add a character-set argument.

"proto_tree_add_item()" might also have to have that argument added (the byte-order argument couldn't, I think, be used, as if the character set in the packet is UCS-2, you'd have to specify whether it's big-endian or little-endian UCS-2).

A problem I have just encountered is that proto_tree_add_string etc all expect the string you are going to add to be the "value" argument. What I want is a routine, like proto_tree_add_ucs2_xx, that will take a string of length X from the TVB at offset, and mark its extent in the byte view, but then convert that string into internal format (UTF-8) for display. I would also want to do formatting as well, which means more work ...

--
Richard Sharpe, rsharpe@xxxxxxxxxx, LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba