Wireshark-bugs: [Wireshark-bugs] [Bug 5405] Unescaped accent in interface name
Date: Tue, 21 Dec 2010 10:04:27 -0800 (PST)
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=5405

Bill Meier <wmeier@xxxxxxxxxxx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #4 from Bill Meier <wmeier@xxxxxxxxxxx> 2010-12-21 13:04:24 EST ---
(In reply to comment #3)
> The culprit is likely in the get_interface_list() function within
> capture-wpcap.c.  As the comments in that function state, the returned
> information from WinPcap are double-byte unicode strings (which would be
> UTF-16) and for some reason, we go through the extra effort of taking apart the
> UTF-16 string and make an ASCII string out of it.  Using a UTF-16 to UTF-8
> conversion routine would probably be best there, such as GLib's
> g_utf16_to_utf8() function.

Digging into the winpcap (and the Wireshark) code a bit:

A. I believe that in capture-wpcap.c the pcap_lookupdev() and related code to
do the unicode to ascii stuff is obsolete and is no longer ever used. (See
below).

B. I also believe that the actual issue is as follows:
   It appears that on Windows pcap_findalldevs returns strings encoded
   using the "system default ansi code page" convert the strings to UTF8
   right in  the interface names 
   even though the interface names are stored in the Windows registry 
   in "wide char" (UTF16).

   (Winpcap ultimately calls RegQueryValueExA get get the interface name
    strings from the registry).

   So: I think the correct solution is that on Windows g_locale_to_utf8()
   must be called before using the strings (at least in GTK).

   I'll go through the Wireshark code and see where this might need to be done.
   (Is the right answer to convert the strings to UTF8 right in
   get_interface_list() in capture-wpcap.c ? Is there then an issue about
   needing to convert back to the current locale before printing the strings ?)

   I'm unsure if g_locale_to_utf8() must also be used in *nix systems
   to handle the case where the character encoding being used by the system is 
   other than UTF8. I'll do some further research on this (unless someone
   can provide the answer).


Info re pcap_findalldevs(0 and pcap_lookupdev() usage.

1. In capture-wpcap.c, get_interface_list() calls pcap_findalldevs if the 
   conditional symbol HAVE_PCAP_FINDALLDEVS is defined.
   I believe that this will almost always be true since
   a. that's the way we build and distribute Wireshark;
   b. pcap_findalldevs was implemented many years ago.

   so: anyone needing to build without pcap_findalldevs would be using 
   a very old obsolete version of winpcap.

2. get_interface_list() calls pcap_lookupdev() only if pcap_findalldevs is not
available.

The unicode to ASCII stuff happens only if pcap_lookupdev is called.

I change the code compile the code to use pcap_lookup_dev only if
pcap_findalldevs isn't available.

-- 
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.