Wireshark-dev: Re: [Wireshark-dev] 3GPP 23.038 encoding and string length
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Sat, 28 Dec 2013 14:50:39 -0800
On Dec 24, 2013, at 2:43 AM, Pascal Quantin <pascal.quantin@xxxxxxxxx> wrote:

> r54428 introduced a ENC_3GPP_TS_23_038 encoding type so as to be able to use proto_tree_add_item directly instead of manually decoding the string with gsm_sms_char_7bit_unpack() / gsm_sms_chars_to_utf8() functions.
> While it is a very good idea (much more easier to use) it raises an interesting issue. With this 7 bits encoding a payload of 7 bytes will hold either 7 or 8 characters. This is handled by gsm_sms_char_7bit_unpack() function thanks to an extra parameter specifying the number of characters.

Presumably that's the out_length parameter (which doesn't appear to be checked before every character is written to the output string); the in_length parameter counts input octets, not output characters.  However, out_length appears primarily to be used when extracting into a fixed-length buffer, with the buffer length passed as the out_length argument.

GSM MAP is encoded using ASN.1 BER, and USSD-String is an OCTET STRING, so BER gives its length in octets, not characters, and it's preceded by lengthInCharacters, giving its length in characters.

In that case, we need to make sure we don't process more than the specified number of bytes and don't process more than the specified number of characters.  If ({number of characters}*7 + 7)/8 > {number of bytes}, there should probably be an expert info reporting an error; we might want to dissect all the characters we can extract from the specified number of bytes, at least.  If {number of bytes} < {number of characters}*7 + 7)/8, we might also want to warn that there are too many padding bytes, and dissect {number of characters} characters.  In both those cases, a "number of characters" count is all that needs to be passed to the string-extractor or item-adder routine; if ({number of characters}*7 + 7)/8 > {number of bytes}, the "number of characters" count should be ({number of bytes}*8)/7 rather than {number of characters}.

For the ETSI TS 102 223 v10.0.0/3GPP TS 11.14 v8.17.0/3GPP TS 31.111 v9.7.0 smart card stuff, however, the text string appears to just be a TLV, so you only get a length in bytes; presumably padding should be ignored in that case, and we can just use proto_tree_add_item() or tvb_get_string_enc().

Are there cases where only the length in characters is given?