Pierre Goyette said:
> Regarding your question about code pages, the issue also applies to
> ASCII. I'm sure you are aware that ASCII actually only represents 0 ->
> 127.
Yes.
> So, for 128-255, what code page determines the glyph to be
> dispalyed? PC 437? Windows Latin-1 1252? ISO Latin-1 8859-1?
The glyph that should be displayed there should probably be ".". There
isn't even a guarantee that non-ASCII bytes are single characters - there
might be UTF-8 text in the packet, for example. An option to choose
"ASCII" (meaning "if the 8th bit is set, display it as '.'") and various
SBCS's might be useful, but dealing with DBCS's or UTF-8 is probably not
possible, at least with a model that each byte gets displayed as a single
character.
> The same applies to EBCDIC. What does 0x5B mean? With Open Systems
> Latin-1 1047, it means the dollar sign '$'. With the UK CECP 1080, it
> means the pound sign.
If EBCDIC is strictly SBCS, an option to choose the particular code page
would be useful, otherwise an option to choose which of the SBCS character
encodings would be useful.
> I don't know if Ethereal is single-byte or Unicode. If it is SBCS, then
> you need tables to convert from a selected host code page to the target
> display code page (1252 for Windows). If Ethereal is Unicode, then you
> need tables to convert from the host SBCS to Unicode.
Ethereal is currently best described as "ASCII". In the future, this will
probably be fixed, but different parts of Ethereal will be different. The
low-level GUI code will do whatever the GUI toolkit being used expects -
that's Unicode on Windows (which would require Microsoft's Unicode
Services for Windows or whatever their partial implementation of the
Unicode Win32 APIs for Windows 95/98/Me is called), UTF-8 in GTK+ 2.x, the
encoding for the font being used in GTK+ 1.2[.x], UTF-8 in Aqua, and
QStrings in Qt. The stuff that generates the columns for the packet list,
the text for the fields in the packet detail, and the hex dump information
will probably neither be SBCS nor Unicode, it'll probably be UTF-8.