Ethereal-dev: [Ethereal-dev] [PATCH][HTTP]Desegmentation/Reassembly of HTTP headers/bodies

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Loïc Minier <lool+ethereal@xxxxxxxxxx>
Date: Thu, 16 Oct 2003 12:17:11 +0200
 [ Attached patch includes tested HTTP-headers/body reassembly based on
 content-length. ]

Loïc Minier <lool+ethereal@xxxxxxxxxx> - Thu, Oct 09, 2003:

> > One problem with "tvb_find_guint8()", at least as you're using it, is 
> > that it assumes that lines end with CR-LF.  Perhaps they *should*, but 
> > that doesn't mean that they necessarily *will*.
> > "tvb_find_line_end()" doesn't care whether the line ends with CR, LF, 
> > CR-LF, or LF-CR.
> > How would "tvb_find_line_end()" have more problems with malformed 
> > headers than "tvb_find_guint8()"?
>  Yes, I've spotted that point too, and that's why I switched to
>  "tvb_find_guint8()": to be sure to match byte exactly the end of the
>  headers. Now I see how it can be useful, because I did not see the real
>  meaning of the "next_offset" it returns, I should rework my code to use
>  "tvb_find_line_end()" again, sorry.

 My assumption or "CRLF" ending line is not needed any more, I switched
 back to tvb_find_line_end in the attached patch as discussed above.

>  I did some additional captures, and it seems "chunked" is quite
>  common, where gzip/deflate/compress/whatever never happens (although I
>  Accept-Encoding: gzip,deflate).

 I checked the Content-Length detection/reassembly with "gzip" and
 "chunked" encoding capture, and saw no apparent problem.
   The only possible problem I spot is when the end of the HTTP response
 is not in the capture. If I understand correctly, the behaviour is that
 pinfo->can_desegment will be set to false if there are no more bytes to
 desegment, is this correct?


    Kind regards,

-- 
Loïc Minier <loic.minier@xxxxxxxxxxx>
Index: packet-http.c
===================================================================
RCS file: /cvsroot/ethereal/packet-http.c,v
retrieving revision 1.67
diff -u -b -r1.67 packet-http.c
--- packet-http.c	2 Sep 2003 23:09:10 -0000	1.67
+++ packet-http.c	16 Oct 2003 10:09:04 -0000
@@ -40,6 +40,7 @@
 
 #include "util.h"
 #include "packet-http.h"
+#include "prefs.h"
 
 typedef enum _http_type {
 	HTTP_REQUEST,
@@ -67,6 +68,19 @@
 static dissector_handle_t data_handle;
 static dissector_handle_t http_handle;
 
+/*
+ * desegmentation of HTTP headers
+ * (when we are over TCP or another protocol providing the desegmentation API)
+ */
+static gboolean http_desegment_headers = FALSE;
+
+/*
+ * desegmentation of HTTP bodies
+ * (when we are over TCP or another protocol providing the desegmentation API)
+ * TODO let the user filter on content-type the bodies he wants desegmented
+ */
+static gboolean http_desegment_body = FALSE;
+
 #define TCP_PORT_HTTP			80
 #define TCP_PORT_PROXY_HTTP		3128
 #define TCP_PORT_PROXY_ADMIN_HTTP	3132
@@ -207,6 +221,7 @@
 	gint		offset = 0;
 	const guchar	*line;
 	gint		next_offset;
+	gint		next_offset_sav;
 	const guchar	*linep, *lineend;
 	int		linelen;
 	guchar		c;
@@ -217,8 +232,109 @@
 	RequestDissector	req_dissector;
 	int			req_strlen;
 	proto_tree		*req_tree;
+        long int        content_length;
+        gboolean        content_length_found = FALSE;
+
+        /*
+         * RFC 2616 defines HTTP messages as being either of the Request or
+         * the Response type (HTTP-message = Request | Response).
+         * Request and Response are defined as:
+         *     Request = Request-Line
+         *         *(( general-header
+         *         | request-header
+         *         | entity-header ) CRLF)
+         *         CRLF
+         *         [ message-body ]
+         *     Response = Status-Line
+         *         *(( general-header
+         *         | response-header
+         *         | entity-header ) CRLF)
+         *         CRLF
+         *         [ message-body ]
+         * that's why we can always assume two consecutive CRLF to mark
+         * the end of the headers, worst thing happenning otherwise is
+         * the packet not being desegmented or being interpreted as only
+         * headers
+         */
+        /*
+         * if headers desegmentation is activated, check that all headers are
+         * in this tvbuff (search for an empty line marking end of headers) or
+         * request one more byte
+         */
+        if (http_desegment_headers && pinfo->can_desegment) {
+                next_offset = offset;
+                for (;;) {
+                        next_offset_sav = next_offset;
+                        /*
+                         * request one more byte if there's no byte left
+                         */
+                        if (tvb_offset_exists(tvb, next_offset) == FALSE) {
+                                pinfo->desegment_offset = offset;
+                                pinfo->desegment_len = 1;
+                                return;
+                        }
+                        /*
+                         * request one more byte if we can not find a
+                         * header (ie. a line end)
+                         */
+                        linelen = tvb_find_line_end(tvb,
+                                      next_offset,
+                                      -1,
+                                      &next_offset,
+                                      TRUE);
+                        /* not enough data, ask for one more byte */
+                        if (linelen == -1) {
+                                pinfo->desegment_offset = offset;
+                                pinfo->desegment_len = 1;
+                                return;
+                        } else if (linelen == 0) {
+                                break;  /* we found the end of the headers */
+                        }
+                        /*
+                         * search content-length, if it fails it either means
+                         * that we are in a different header line, or that we
+                         * are at the end of the headers, or that there isn't
+                         * enough data, the two later cases have already been
+                         * handled above
+                         */
+                        if (http_desegment_body) {
+                                /* check if we've found Content-Length */
+                                if (tvb_strneql(tvb,
+                                            next_offset_sav,
+                                            "Content-Length:",
+                                            15) == 0) {
+                                        if (sscanf(
+                                           tvb_get_string(tvb,
+                                                next_offset_sav + 15,
+                                                linelen - 15),
+                                           "%li",
+                                           &content_length) == 1) {
+                                                content_length_found = TRUE;
+                                        }
+                                }
+                        }
+                }
+        }
+        /*
+         * the above loop ends when we reached the end of the headers, so
+         * there should be content_length byte after the 4 terminating bytes
+         * and next_offset points to after the end of the headers
+         */
+        if (http_desegment_body && content_length_found) {
+                /* next_offset has been set because content-length was found */
+                if (FALSE == tvb_bytes_exist(
+                                 tvb, next_offset, content_length)) {
+                        gint length = tvb_length_remaining(tvb, next_offset);
+                        if (length == -1) {
+                                length = 0;
+                        }
+                        pinfo->desegment_offset = offset;
+                        pinfo->desegment_len = content_length - length;
+                        return;
+                }
+        }
 
-	stat_info	=g_malloc( sizeof(http_info_value_t));
+	stat_info	= g_malloc( sizeof(http_info_value_t));
 	stat_info->response_code = 0;
 	stat_info->request_method = NULL;
 
@@ -658,11 +774,25 @@
 		&ett_http_ntlmssp,
 		&ett_http_request,
 	};
+        module_t *http_module;
 
 	proto_http = proto_register_protocol("Hypertext Transfer Protocol",
 	    "HTTP", "http");
 	proto_register_field_array(proto_http, hf, array_length(hf));
 	proto_register_subtree_array(ett, array_length(ett));
+        http_module = prefs_register_protocol(proto_http, NULL);
+        prefs_register_bool_preference(http_module, "desegment_http_headers",
+                "Desegment all HTTP headers spanning multiple TCP segments",
+                "Whether the HTTP dissector should desegment all headers "
+                "of a request spanning multiple TCP segments",
+                &http_desegment_headers);
+        prefs_register_bool_preference(http_module, "desegment_http_body",
+                "Trust the « Content-length: » header and desegment HTTP "
+                "bodies spanning multiple TCP segments",
+                "Whether the HTTP dissector should use the "
+                "« Content-length: » value to desegment the body "
+                "of a request spanning multiple TCP segments",
+                &http_desegment_body);
 
 	register_dissector("http", dissect_http, proto_http);
 	http_handle = find_dissector("http");