Wireshark-dev: Re: [Wireshark-dev] 3GPP 23.038 encoding and string length
From: Pascal Quantin <pascal.quantin@xxxxxxxxx>
Date: Sun, 29 Dec 2013 11:44:31 +0100
Hi,

2013/12/28 Guy Harris <guy@xxxxxxxxxxxx>

On Dec 24, 2013, at 2:43 AM, Pascal Quantin <pascal.quantin@xxxxxxxxx> wrote:

> r54428 introduced a ENC_3GPP_TS_23_038 encoding type so as to be able to use proto_tree_add_item directly instead of manually decoding the string with gsm_sms_char_7bit_unpack() / gsm_sms_chars_to_utf8() functions.
> While it is a very good idea (much more easier to use) it raises an interesting issue. With this 7 bits encoding a payload of 7 bytes will hold either 7 or 8 characters. This is handled by gsm_sms_char_7bit_unpack() function thanks to an extra parameter specifying the number of characters.

Presumably that's the out_length parameter (which doesn't appear to be checked before every character is written to the output string); the in_length parameter counts input octets, not output characters.  However, out_length appears primarily to be used when extracting into a fixed-length buffer, with the buffer length passed as the out_length argument.
As you said the purpose of out_length is to give the maximum number of characters to be unpacked. In packet-gsm_sms.c, this parameter is begin set with udl value (with a protection in case udl variable would be bigger than the output buffer). In packet-ansi_637.c, num_fields represents the number of characters to be decoded.

GSM MAP is encoded using ASN.1 BER, and USSD-String is an OCTET STRING, so BER gives its length in octets, not characters, and it's preceded by lengthInCharacters, giving its length in characters.
Yes.

In that case, we need to make sure we don't process more than the specified number of bytes and don't process more than the specified number of characters.  If ({number of characters}*7 + 7)/8 > {number of bytes}, there should probably be an expert info reporting an error; we might want to dissect all the characters we can extract from the specified number of bytes, at least.  If {number of bytes} < {number of characters}*7 + 7)/8, we might also want to warn that there are too many padding bytes, and dissect {number of characters} characters.  In both those cases, a "number of characters" count is all that needs to be passed to the string-extractor or item-adder routine; if ({number of characters}*7 + 7)/8 > {number of bytes}, the "number of characters" count should be ({number of bytes}*8)/7 rather than {number of characters}.

For the ETSI TS 102 223 v10.0.0/3GPP TS 11.14 v8.17.0/3GPP TS 31.111 v9.7.0 smart card stuff, however, the text string appears to just be a TLV, so you only get a length in bytes; presumably padding should be ignored in that case, and we can just use proto_tree_add_item() or tvb_get_string_enc().
The specification defines a rule where the originator must explicitly add a <CR> if needed to avoid the padding bits:
"If the total number of characters in the text string equals (8n-1) where n = 1, 2, 3, etc. then there are 7 spare bits at the end of the message. To avoid the situation where the receiving entity confuses 7 binary zero pad bits as the @ character, the carriage return (i.e. <CR>) character shall be used for padding in this situation, as defined in TS 123 038", So proto_tree_add_item is fine (probably the only case).

Are there cases where only the length in characters is given?
3GPP/3GPP2 SMS (packet_gsm_sms.c and packet_ansi_637.c).

The Network Name information element in packet-gsm_a_dtap.c gives the number of padding bits in the last octet so it can be easily compute the number of characters.
I did not check GMR1 and SMS Cell Broadcast specs yet.

Pascal.