Wireshark · Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request

Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request

Date: Thu, 12 Aug 2010 09:42:55 -0400

On 8/12/2010 5:08 AM, Sake Blok wrote:

On 12 aug 2010, at 10:09, Thierry Emmanuel wrote:

Your suggestion of parsing the data is indeed unique and intersting.
Are you suggesting that dumpcap or ethereal would somehow interogate the
link, follow it and then make a determination.  This sounds like a very
interesting prospect but I'm not fully sure I understand how it would work.

It were a solution a bit "violent". ;) This information could be extracted
by a mime-type analyzer processing the content of the page.
But I had forgot a more simpler solution. You can read the mime-type
announced by the webserver, located in the "Content-type" field. As says
RFC 2616, this information isn't required in the packet but in fact it is
allmost always there. So when this field is present in the http response,
you can parse it and check that the response is html, plain text, or xml.
If it isn't, you can discard it (by this way, you'll be able to ignore
images, videos, applets, javascript files, css files (less important because
they are commonly hosted on the same domain), so a large part of noise.

The difficulty is that if you extract the required url from the request, you
have to make a relation between the request and the response and you might
need scripting. You could bypass this limitation by working only on the
response but I don't have studied that point especially.

The whole problem is that each object requested by a browser can be an object that was requested by the user or an object that was needed for displaying the webpage that the user requested. Only by parsing the html (and js and css) we can deduct whether an object was automatically requested by the browser or was intentionally requested by the user by clicking on a link. There are specialized products that do just that and can give you great reports... but they are expensive...

The best I have come up with so far is to look only at requested objects of type "text/html" and then look at the referer instead of the host header (and the host header if the referer is empy). But also this is far from perfect. It leaves in false positives and might have some false negatives too. But you can give it a shot to see how it compares to what you already have...

tshark -nlr http.cap -R 'http.request and http.accept contains "text/html"' -T fields -e http.host -e http.referer | awk '$2=="" {print $1;next} {print $2}' | sed -e 's#^http://\([^\/]*\).*$#\1#' | sed -e 's/^.*\.\([^\.]*\.[^\.]*\)$/\1/' | sort | uniq -c | sort -rn | head -100

As said before, it all depends on the goal you try to achieve and the means you have to achieve them.

Cheers,


Sake

The ultimate goal is to not necessarily count the number of requests toany specific URL although that number is interesting. It's to see whatdomains a user is going to. For this study it is irrelevant the linkswithin the domains, just the top level is sufficient.

So far Sake's tshark query (cited above) works like a charm. Thank youSake and Emmanuel for helping me with this. The idea of searching forthe "text/html" was ingenious, Emannual! I never would have thought ofthat guys. It's just plain ingenious!

So far Sake's snippet of code works fine and I believe that issufficient for now. I know it would be best to plugin the browser butin this study the company does not have access (or is unwilling to)access the browser directly.


Thanks guys!

References:
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: j.snelders
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Thierry Emmanuel
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Thierry Emmanuel
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok

Prev by Date: [Wireshark-users] Previous/Next Frame Display Filter?
Next by Date: Re: [Wireshark-users] filter for ONLY initial get request
Previous by thread: Re: [Wireshark-users] filter for ONLY initial get request
Next by thread: [Wireshark-users] libwsutil.so.0 => not found running tshark
Index(es):
- Date
- Thread