Wireshark-bugs: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
Date: Sun, 03 Mar 2013 04:15:11 +0000
Comment # 34
on bug 8382
from Guy Harris
(In reply to comment #31) > (In reply to comment #30) > I suspect for now the current method is satisfactory, being that whatever > the UI toolkit does is what we get unless the dissector manually calls > format_str. What GTK+ does, and what I think Qt does, is assume strings are UTF-8. That works fine for UTF-8 strings displayed on-screen in Wireshark; it doesn't work so well for, for example, strings in some other flavor of extended ASCII (ISO 8859/n, various double-byte EUCs, various DOS/Windows code pages, various pre-OS X Mac encodings, etc.), and doesn't necessarily work so well when writing to a text file or to the terminal (on UN*Xes, it works fine if the user has UTF-8 as their character encoding, but would require a little iconv assistance if it's not; on Windows, we'd probably want *some* Unicode encoding, but would it be UTF-8, for which Windows *does* have a code page, or UTF-16?). > I agree we'll want a flag for string fields at some point, > though I'm not sure if it should be on the hf field or in the encoding arg > of the tree_add call. The encoding arg says what character encoding is used for the string in the packet. The formatting arg says how it should be presented, and the same string might be presented in different ways in different contexts: for XML, we'd probably want to encode all non-printable characters as entities, except that the 1.1 spec says in section 4.1 "Character and Entity References": http://www.w3.org/TR/xml11/#sec-references that "Characters referred to using character references must match the production for Char.", and the production for Char is Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] which excludes NUL, the surrogate blocks, U+FFFE, and U+FFFF, so I'm not sure how you say, in XML, "this string has the value "hi\0mom\xFFFE""; for JSON, we can encode anything (RFC 4627 says "All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)", which sure sounds as if a JSON string can have an embedded NUL); for humans, we'd sometimes want something C-string like (e.g., when showing stuff encoded as text, such as HTML requests and responses) and sometimes want something using REPLACEMENT CHARACTER etc. (e.g., when showing strings encoded in binary representations, such as what you see in XDR, SMB messages, etc.). For the latter, that *might* be something we could make a property of the field.
You are receiving this mail because:
- You are watching all bug changes.
- Prev by Date: [Wireshark-bugs] [Bug 8316] tshark -2 -R "some filter" issues
- Next by Date: [Wireshark-bugs] [Bug 5985] [IEEE 802.11] "Invalid key format" on any key entered
- Previous by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
- Next by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
- Index(es):