Wireshark-dev: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
From: Evan Huus <eapache@xxxxxxxxx>
Date: Sun, 26 Jan 2014 18:32:18 -0500
On Sun, Jan 26, 2014 at 5:43 PM, Guy Harris <guy@xxxxxxxxxxxx> wrote:
>
> On Jan 26, 2014, at 2:32 PM, Evan Huus <eapache@xxxxxxxxx> wrote:
>
>> OK. I just meant that since tvb_get_string() is currently ASCII, a
>> dumb search and replace will let us make the API change now without
>> any regressions. We can then audit calls that could be incorrect.
>
> I apologize - I misparsed your question as "why would dumb search-and-replace of tvb_get_string with tvb_get_string_enc and ENC_ASCII be an easy way to make (part of) the API transition?", i.e. that you were saying that dumb search-and-replace didn't sound like a good idea to you, rather than as "so does that mean that we should start by doing a dumb search-and-replace of tvb_get_string with tvb_get_string_enc and ENC_ASCII, as an easy way to make (part of) the API transition?"

Darn, your right I never even thought of that interpretation. I
apologize also; the inflection in my head made it unambiguous :P

> (It might've been clearer as "in which case, is dumb search and replace", so that dummies like me read "in which case" as meaning "therefore" rather than "to which case are you referring where...")

And note that this is what happens between two native English
speakers. I don't even want to think about the problems a non-native
speaker might have with some of what I've written. Sigh.

>> Admittedly, it's easier to track which calls have been audited if we
>> do it gradually, so that's probably a better choice anyways.
>
> Yes.  In some cases, ENC_ASCII may well be appropriate, if the protocol spec says that the string must be ASCII (i.e., ASCII, and not ISO 8859-n, and not MacWhatever, and not DOS or Windows code page whatever, and not PickYourEUCMultiByteCodeSet, and not UTF-8...), and ENC_ASCII as the result of a dumb search-and-replace is, absent a "this really means ASCII" comment, indistinguishable from ENC_ASCII as the result of looking in the protocol specification and seeing that they really mean ASCII.
>
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
> Archives:    http://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>              mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe