Wireshark-dev: [Wireshark-dev] String Handling API
From: Sebastien Tandel <sebastien@xxxxxxxxx>
Date: Sun, 24 Dec 2006 23:39:23 +0100
Hi all,



   Guy *really* thinks the only solution is to get an accurate string
handling library and I *really* want to get rid of useless warnings and
confusing code (think to the one(s) who will change the code when having
a string handling API and who will have to know which field is a
string(character) or not) ... I am proposing to work on this string
handling API (at least on the first part of it).
As described on the website iconv can't be the only way to determine
character encoding. Glib2.0 is working with iconv. I thought however it
was a good starting point anyway to work with glib2.0 API as wireshark
is already using it. I then thought in a first draw to get an
intermediate interface to the glib2.0 API for the manipulation of
unicode strings as it lets the door open to differentiate the platform
and the possibility to add others solutions for manipulation strings and
character encoding determination.
Do you think it's a good starting point? Have you more ideas than the
ones described in the wishlist page?



I also read again Guy's mail about (guint8*) ... here are some others
thoughts on ctype functions and friends ...

> Note also that you can safely pass a guint8 or guchar to one of the 
> <ctype.h> routines, but you can't safely pass a gchar to them, as they 
> might get sign-extended into negative values if the 8th bit is set (I 
> think that none of the popular platforms for Windows and modern UN*Xes 
> have C compilers with "char" an unsigned type, so I think "might" can be 
> replaced by "will" in practice).
>   

Two questions :
1) How many of these functions are used in wireshark?
2) Are they correctly used?

number of ctype functions    : 89 (dissectors part)

- So few ctype functions, we might hope that none is using a (g)char.
However here are the files using gchar (or char) : (patch provided
except for packet-tds)
packet-catapult-dct2000.c : 449
packet-exec.c : 436, 454
packet-sip : 1178, 1236, 1298
packet-slimp3 : 372, 439
packet-tds : 958 (#if 0 but nevertheless)

10% of misuse and ctype functions are not heavily used ... sic :-/

We can then conclude that the use of guint8* in tvb_get_* don't avoid
misuses of ctype functions with gchar. It does not avoid them and no one
noticed them. I think you'd better put a comment on ctype and
guchar(guint8) in the README.developer.

packet-ftp do not use (g)char for ctypes functions but will use gchar
for tvb_get_nstringz0. In this case, we nevertheless get useful warning!
There are more than 100 warnings of this style in wireshark!(no patch
for these ones :() There is even more confusion when seeing the
signature of proto_tree_add_string using char ... :-/

Just to mention furthermore the number of g_str.* (or g_???printf)
functions     : more than 1000 (in dissector part)
It's really a pity to get all these useless warnings. All these useless
warnings are hiding all the useful ones ... (see here above)


Best Regards,
Sebastien Tandel
Index: gtk/isis_analysis.c
===================================================================
Index: gtk/isis_analysis.h
===================================================================
Index: epan/dissectors/packet-sip.c
===================================================================
--- epan/dissectors/packet-sip.c	(révision 20211)
+++ epan/dissectors/packet-sip.c	(copie de travail)
@@ -1151,7 +1151,7 @@
 	guint transport_slash_count = 0;
 	gboolean transport_name_started = FALSE;
 	gboolean colon_seen = FALSE;
-	gchar c;
+	guchar c;
 	gchar *param_name = NULL;
 
 	/* skip Spaces and Tabs */
Index: epan/dissectors/packet-exec.c
===================================================================
--- epan/dissectors/packet-exec.c	(révision 20211)
+++ epan/dissectors/packet-exec.c	(copie de travail)
@@ -46,8 +46,8 @@
 
 /* Forward declaration we need below */
 void proto_reg_handoff_exec(void);
-gboolean exec_isprint_string(gchar *string);
-gboolean exec_isdigit_string(gchar *string);
+gboolean exec_isprint_string(guchar *string);
+gboolean exec_isdigit_string(guchar *string);
 
 /* Variables for our preferences */
 static gboolean preference_info_show_username = TRUE;
@@ -113,7 +113,7 @@
 	proto_tree *exec_tree=NULL;
 
 	/* Variables for extracting and displaying data from the packet */
-	gchar *field_stringz; /* Temporary storage for each field we extract */
+	guchar *field_stringz; /* Temporary storage for each field we extract */
 
 	gint length;
 	guint offset = 0;
@@ -250,7 +250,7 @@
 		 */
 		if(length == 1 || (exec_isdigit_string(field_stringz)
 		&& length <= EXEC_STDERR_PORT_LEN)){
-			proto_tree_add_string(exec_tree, hf_exec_stderr_port, tvb, offset, length, field_stringz);
+			proto_tree_add_string(exec_tree, hf_exec_stderr_port, tvb, offset, length, (gchar*)field_stringz);
 			 /* Next field we need */
 			hash_info->state = WAIT_FOR_USERNAME;
 		} else { 
@@ -270,13 +270,13 @@
 		/* Check if this looks like the username field */
 		if(length != 1 && length <= EXEC_USERNAME_LEN
 		&& exec_isprint_string(field_stringz)){
-			proto_tree_add_string(exec_tree, hf_exec_username, tvb, offset, length, field_stringz);
+			proto_tree_add_string(exec_tree, hf_exec_username, tvb, offset, length, (gchar*)field_stringz);
 
 			/* Store the username so we can display it in the 
 			 * info column of the entire conversation
 			 */
 			if(!hash_info->username){
-				hash_info->username=se_strdup(field_stringz);
+				hash_info->username=se_strdup((gchar*)field_stringz);
 			}
 
 			 /* Next field we need */
@@ -298,7 +298,7 @@
 		/* Check if this looks like the password field */
 		if(length != 1 && length <= EXEC_PASSWORD_LEN
 		&& exec_isprint_string(field_stringz)){
-			proto_tree_add_string(exec_tree, hf_exec_password, tvb, offset, length, field_stringz);
+			proto_tree_add_string(exec_tree, hf_exec_password, tvb, offset, length, (gchar*)field_stringz);
 
 			/* Next field we need */
 			hash_info->state = WAIT_FOR_COMMAND;
@@ -321,13 +321,13 @@
 		/* Check if this looks like the command field */
 		if(length != 1 && length <= EXEC_COMMAND_LEN
 		&& exec_isprint_string(field_stringz)){
-			proto_tree_add_string(exec_tree, hf_exec_command, tvb, offset, length, field_stringz);
+			proto_tree_add_string(exec_tree, hf_exec_command, tvb, offset, length, (gchar*)field_stringz);
 	      
 			/* Store the username so we can display it in the 
 			 * info column of the entire conversation
 			 */
 			if(!hash_info->command){
-				hash_info->command=se_strdup(field_stringz);
+				hash_info->command=se_strdup((gchar*)field_stringz);
 			}
 
 		} else {
@@ -427,7 +427,7 @@
 
 /* Custom function to check if an entire string is printable. */
 gboolean
-exec_isprint_string(gchar *string)
+exec_isprint_string(guchar *string)
 {
 	guint position;
 
@@ -445,7 +445,7 @@
 
 /* Custom function to check if an entire string is digits. */
 gboolean
-exec_isdigit_string(gchar *string)
+exec_isdigit_string(guchar *string)
 {
 	guint position;
 
Index: epan/dissectors/packet-slimp3.c
===================================================================
--- epan/dissectors/packet-slimp3.c	(révision 20211)
+++ epan/dissectors/packet-slimp3.c	(copie de travail)
@@ -231,7 +231,7 @@
     gint		i1;
     gint		offset = 0;
     guint16		opcode;
-    char		lcd_char;
+    guchar		lcd_char;
     char		lcd_str[MAX_LCD_STR_LEN + 1];
     int			to_server = FALSE;
     int			old_protocol = FALSE;
Index: epan/dissectors/packet-catapult-dct2000.c
===================================================================
--- epan/dissectors/packet-catapult-dct2000.c	(révision 20211)
+++ epan/dissectors/packet-catapult-dct2000.c	(copie de travail)
@@ -26,6 +26,8 @@
 # include "config.h"
 #endif
 
+#include <glib.h>
+
 #include <string.h>
 #include <ctype.h>
 #include <epan/packet.h>
@@ -105,7 +107,7 @@
 void proto_register_catapult_dct2000(void);
 
 static dissector_handle_t look_for_dissector(char *protocol_name);
-static void parse_outhdr_string(char *outhdr_string);
+static void parse_outhdr_string(guchar *outhdr_string);
 static void attach_fp_info(packet_info *pinfo, gboolean received,
                            const char *protocol_name, int variant);
 
@@ -433,7 +435,7 @@
 
 
 /* Populate outhdr_values array with numbers found in outhdr_string */
-void parse_outhdr_string(char *outhdr_string)
+void parse_outhdr_string(guchar *outhdr_string)
 {
     int n = 0;
 
@@ -444,7 +446,7 @@
         guint digits;
 
         /* Find digits */
-        for (digits = 0; digits < strlen(outhdr_string); digits++, n++)
+        for (digits = 0; digits < strlen((gchar*)outhdr_string); digits++, n++)
         {
             if (!isdigit(outhdr_string[n]))
             {
@@ -702,7 +704,7 @@
         (strcmp(protocol_name, "fp_r5") == 0) ||
         (strcmp(protocol_name, "fp_r6") == 0))
     {
-        parse_outhdr_string((char*)tvb_get_ephemeral_string(tvb, outhdr_start, outhdr_length));
+        parse_outhdr_string(tvb_get_ephemeral_string(tvb, outhdr_start, outhdr_length));
         attach_fp_info(pinfo, direction, protocol_name,
                        atoi((char*)tvb_get_ephemeral_string(tvb, variant_start, variant_length)));
     }