Wireshark-dev: Re: [Wireshark-dev] No tvb_get for string-encoded numbers?
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Fri, 4 Apr 2014 13:04:40 -0700
On Apr 4, 2014, at 7:30 AM, Hadriel Kaplan <hadriel.kaplan@xxxxxxxxxx> wrote:

> I might be overlooking something, but I don’t see a tvb_get_* function to get a uint8/16/32/64 that was encoded as a ascii or utf-8 string in the packet. Is there such a thing?

No.

I've occasionally also thought there should be such a routine.

Note, though, that, whilst tvb_get_guint8() and tvb_get_{n,le}tohXXX() can never fail, because every possible sequence of octets is a valid 2's complement integral value, routines to get a number encoded as a string *can* fail, e.g. 0123xyzw is not a valid number in bases 8, 10, or 16.

There are other cases where a tvb_get_ routine can return "you lose", e.g. tvb_get_string_enc() can fail if there are invalid octet sequences (about the only encodings I know of where *every* octet sequence is a valid string are some of the ISO 8859-n encodings), and at least some floating-point formats probably have invalid values (I guess an IEEE NaN is "valid", at least to the extent that if we try to format it it'll show up as "NaN", but if we try to do calculations with it we might get a floating-point exception.

> Instead, it seems the dissectors that deal with string messages do a tvb_get_string_enc() or tvb_format_text(), and then a strtol() or atoi(). But in my way of thinking, the fact that it’s in a string-encoded form in the tvb isn’t that much different from it being encoded as little-endian vs. network-order.
> 
> Likewise, it’s not clear if there’s a way to define a protocol field that is encoded as a string in the packet but is internally a uint8/16/32/64 (e.g., for filtering purposes, val_string lookup, etc.). For example such that proto_tree_add_item() would work. Instead, it seems some dissectors use the returned strtol/atoi to then add the field to the tree as a true uint type, or add it as a FT_STRING field type.

One advantage of that is that, if the routine to fetch the value also adds an item to the protocol tree, it could, in the cases where the value is invalid, also add an expert item indicating that the value isn't valid.

And I'd like to see proto_tree_add_XXX_item() routines that add an item with a particular type *and* take a pointer argument and return the value for the item through that pointer; that could replace

	xxx = tvb_get_XXX();
	proto_tree_add_XXX(..., xxx);

combinations and

	xxx = tvb_get_XXX();
	proto_tree_add_item(...);	/* re-fetches the item value */

with

	proto_tree_add_XXX_item(..., &xxx);

> And if we had common functions handle ascii and utf-8 string-encoded numbers, they could avoid creating temporary strings as well.

The only real encoding issues are "ASCII superset" (so that "0123456789", for example, is encoded the same as in ASCII) vs. "2 or more bytes per ASCII character" (e.g., UCS-2, UTF-16, and UCS-4) vs. "one of those 7-bit GSM character encodings" vs. "EBCDIC".