Wireshark-dev: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
From: Evan Huus <eapache@xxxxxxxxx>
Date: Sun, 26 Jan 2014 17:32:39 -0500
On Sun, Jan 26, 2014 at 4:53 PM, Guy Harris <guy@xxxxxxxxxxxx> wrote:
>
> On Jan 21, 2014, at 5:01 AM, Evan Huus <eapache@xxxxxxxxx> wrote:
>
>> On Tue, Jan 21, 2014 at 2:40 AM, Guy Harris <guy@xxxxxxxxxxxx> wrote:
>>>
>>> On Jan 20, 2014, at 5:59 PM, Evan Huus <eapache@xxxxxxxxx> wrote:
>>>
>>>> In which case is dumb search-and-replace of tvb_get_string with
>>>> tvb_get_string_enc and ENC_ASCII an easy way to make (part of) the API
>>>> transition?
>>>
>>> Did somebody say that had been done or suggest that it be done?
>>
>> I thought it was kind of implied when you wrote "We should also
>> probably audit all calls to tvb_get_string() and tvb_get_stringz() in
>> dissectors and change them to tvb_get_string_enc() with the
>> appropriate encoding."
>
> No, it isn't implied; the last four words of what you quote are "with the appropriate encoding".  A dumb search-and-replace won't do that, as ENC_ASCII isn't the appropriate encoding for all those calls; that's why I said "audit", as in "read the code and the spec and see what the right encoding is".

OK. I just meant that since tvb_get_string() is currently ASCII, a
dumb search and replace will let us make the API change now without
any regressions. We can then audit calls that could be incorrect.

Admittedly, it's easier to track which calls have been audited if we
do it gradually, so that's probably a better choice anyways.

>> If tvb_get_string() behaves identically to tvb_get_string_enc(...
>> ENC_ASCII) then there doesn't seem much point in having both.
>
> There *isn't* much point in having both.  If we "probably audit all calls to tvb_get_string() and tvb_get_stringz() in dissectors and change them to tvb_get_string_enc() with the appropriate encoding.", the next step would be to eliminate tvb_get_string() and tvb_get_stringz() and then rename tvb_get_string_enc() and tvb_get_stringz_enc() to just tvb_get_string() and tvb_get_stringz() (as "_enc" is then redundant).
>
> Given that we already added a wmem scope argument to tvb_get_string() and tvb_get_stringz(), there's no particular reason not to make further API changes to those routines.

+1