Wireshark · Wireshark-bugs: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash

Wireshark-bugs: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash

Date: Sun, 03 Mar 2013 04:15:11 +0000

Comment # 34 on bug 8382 from Guy Harris

(In reply to comment #31)
> (In reply to comment #30)
> I suspect for now the current method is satisfactory, being that whatever
> the UI toolkit does is what we get unless the dissector manually calls
> format_str.

What GTK+ does, and what I think Qt does, is assume strings are UTF-8.  That
works fine for UTF-8 strings displayed on-screen in Wireshark; it doesn't work
so well for, for example, strings in some other flavor of extended ASCII (ISO
8859/n, various double-byte EUCs, various DOS/Windows code pages, various
pre-OS X Mac encodings, etc.), and doesn't necessarily work so well when
writing to a text file or to the terminal (on UN*Xes, it works fine if the user
has UTF-8 as their character encoding, but would require a little iconv
assistance if it's not; on Windows, we'd probably want *some* Unicode encoding,
but would it be UTF-8, for which Windows *does* have a code page, or UTF-16?).

> I agree we'll want a flag for string fields at some point,
> though I'm not sure if it should be on the hf field or in the encoding arg
> of the tree_add call.

The encoding arg says what character encoding is used for the string in the
packet.

The formatting arg says how it should be presented, and the same string might
be presented in different ways in different contexts:

    for XML, we'd probably want to encode all non-printable characters as
entities, except that the 1.1 spec says in section 4.1 "Character and Entity
References":

        http://www.w3.org/TR/xml11/#sec-references

    that "Characters referred to using character references must match the
production for Char.", and the production for Char is

        Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

    which excludes NUL, the surrogate blocks, U+FFFE, and U+FFFF, so I'm not
sure how you say, in XML, "this string has the value "hi\0mom\xFFFE"";

    for JSON, we can encode anything (RFC 4627 says "All Unicode characters may
be placed within the quotation marks except for the characters that must be
escaped: quotation mark, reverse solidus, and the control characters (U+0000
through U+001F)", which sure sounds as if a JSON string can have an embedded
NUL);

    for humans, we'd sometimes want something C-string like (e.g., when showing
stuff encoded as text, such as HTML requests and responses) and sometimes want
something using REPLACEMENT CHARACTER etc. (e.g., when showing strings encoded
in binary representations, such as what you see in XDR, SMB messages, etc.).

For the latter, that *might* be something we could make a property of the
field.

You are receiving this mail because:

You are watching all bug changes.

Prev by Date: [Wireshark-bugs] [Bug 8316] tshark -2 -R "some filter" issues
Next by Date: [Wireshark-bugs] [Bug 5985] [IEEE 802.11] "Invalid key format" on any key entered
Previous by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
Next by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
Index(es):
- Date
- Thread