Wireshark-dev: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
      
      
Sebastien Tandel wrote:
   is there any reason to use guint8* instead of gchar*?
For what purpose?
If you're dealing with an array of 8-bit bytes, or a pointer to a 
sequence of those, guint8 is the right type; it makes it clear that 
they're bytes, not characters (it might be binary, it might be a 
sequence of 16-bit "bytes" in a UTF-16-encoded string, it might be a 
UTF-8 string, etc.).
I.e., tvb_get_ptr(), for example, should return a "guint8 *", as should 
tvb_memdup(), and the raw packet data you get from Wiretap should be 
pointed to by a "guint8 *".
Note also that you can safely pass a guint8 or guchar to one of the 
<ctype.h> routines, but you can't safely pass a gchar to them, as they 
might get sign-extended into negative values if the 8th bit is set (I 
think that none of the popular platforms for Windows and modern UN*Xes 
have C compilers with "char" an unsigned type, so I think "might" can be 
replaced by "will" in practice).
With gcc-4.0, there is the new feature warning you that "pointer target
differs in signedness" (which is not such a bad thing).
I suspect most of those warnings are for cases where you're treating 
byte sequences as character strings.
What I think we *really* need to do, for those cases, is have a 
different way of handling strings.  The current way we handle strings 
doesn't take into account the fact that there are a number of different 
character encodings for strings - "ASCII" (which would imply that a byte 
with the 8th bit set is an error), ISO 8859/n, other EUC encodings, 
Shift-JIS, KOI8, UTF-8, UTF-16, etc..
See the first item under "Dissector infrastructure" on the
	http://wiki.wireshark.org/Development/Wishlist
page.  (That discusses two items - the dissector APIs for handling 
strings, and the UI aspects of this.  The former doesn't require the 
latter - we can continue to display non-ASCII characters as escape 
sequences - but the latter, which is something we should ultimately do, 
requires some way of getting all strings from packets translated into 
Unicode.)
May we change these guint8* to gchar* ? I mean may we change the type of
the concerned variables and not cast to every call of a function ?
Which ones are you thinking of?  We shouldn't globally replace guint8 
with gchar, as per my comments in the beginning.