Wireshark-dev: Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)
On Jun 27, 2011, at 11:54 AM, Stig Bjørlykke wrote:
> When looking at bug 5715 I found that we use both UTF8 (from file
> names) and locale (from strerror()) in the error messages presented
> from simple_dialog(). In vsimple_dialog() we convert all messages
> with g_locale_to_utf8(), which will wrongly convert the file name
> (like in the bug report). When using Norwegian characters in the file
> name the text in the dialog is empty.
I suspect this wouldn't be an issue on my machine, given that if, on my machine, g_locale_to_utf8() behaves differently from strcpy(), there's either a misconfiguration or a bug in g_locale_to_utf8():
$ echo $LANG
en_US.UTF-8
I.e., this issue should, modulo bugs, only show up in locales where the character encoding isn't UTF-8, meaning:
1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with Norwegian characters on your system? If so, what's the setting of LANG?);
2) Windows, where "Unicode" generally means "UTF-16", and APIs that return strings encoded as sequences of octets rather than hexadectets probably return strings in the local code page.
> Any ideas how we should fix this? Convert all messages from
> strerror() when putting the text into the error string and remove the
> conversion in vsimple_dialog()?
I would say "yes", given that GTK+ uses UTF-8 as the string encoding for all GUI functions, and I think any other toolkit we might use as an alternative would also use some encoding of Unicode (UTF-8 or UTF-16, most likely).
> We have about 240 calls to strerror().
...and, unfortunately, a variant that converts to UTF-8 and is API-compatible is non-trivial, as any version that allocates a buffer for the result of the conversion would leak memory we just globally replaced strerror() with ws_strerror().
(Of course, strerror() is also not thread-safe, so there might be other reasons to avoid routines with such an API; the latest shiniest Single UNIX Specification has strerror_r(), which takes a buffer that it fills in, which has its own issues (as in "how big a buffer do you need"?), and I don't know how many platforms have it.
But if you're doing enough calls to strerror() that throwing a mutex around strerror() in your wrapper causes performance problems, those performance problems are probably the least of your problems....)