Ethereal-dev: Re: [Ethereal-dev] extracting http

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: F Lace <flace9@xxxxxxxxx>
Date: Thu, 19 May 2005 16:53:23 +0530
On 5/19/05, Guy Harris <guy@xxxxxxxxxxxx> wrote:
> This is really more of a tcpdump-workers or winpcap-users question than
> an Ethereal developer's question - Ethereal already has code to extract
> HTTP content.

I didnt realize this, I actually reached winpcap through ethereal, so
ethereal-dev is the first thing that came to my mind. sending this
reply to this list again as I wanted to comment on what you said..

> >  u_short urgptr; // Urgent pointer...still don't know what this is...
> 
> See RFC 793 to find out what it is.
> 

Thank you. I had picked up the struct from some google search page, so
I dont need it myself..

> 
> that *http_data* doesn't seem to contain anything?
> 
> If so, note that
> 
>        http_data = (char *) (th + (int)th->hlen + 8*2*5);
> 
> is wrong - the units of the "header length" field are 4-byte words, and
> "th" is presumably a pointer to a "tcp_header" data structure (so that
> adding N to it advances it by N such data structures, i.e. N*20 bytes),
> so you want
> 
>        http_data = (char *) ((u_char *)th + (int)th->hlen*4);
> 
> which will take you past the TCP header to the first byte of the HTTP
> header.  If you want only the HTTP *payload*, you'll have to scan
> through the HTTP header until you either run out of data or find the
> blank line separating the HTTP header from the payload.
> 


Thanks again, for pointing out the silly mistake, my C basics are
really bad. Thanks to ethereal, the unit of "tcp header length" (hlen)
seems to be 2-bit word, so the correct expression is:

        http_data = (char *) ((u_char *)th + (int)th->hlen/4);

The ip header length is 4 bytes a unit.

> By the way, there's no guarantee that there's a null byte at the end of
> the HTTP data, so printing it with "%s" could start trying to print
> random data past the end of the HTTP data - you're using "snprintf()",
> so it'll eventually stop, but you should probably calculate the length
> of the HTTP data (based on the total length field from the IP header and
> the lengths of the IP and TCP headers) and use "%.*s" to print the HTTP
> data, with the length of the HTTP data given as the argument for the
> ".*" part of the format.  (Of course, there's no guarantee that the HTTP
> payload is text - what if it's fetching a GIF or JPEG, for example? - so
> you'd probably either want to print only the headers, meaning you'll
> have to scan the headers yourself, or print each character of the HTTP
> data separately, checking whether the characters are printable.)
> 


Right, I was just trying to get started.

Thanks again.