Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request
From: Sake Blok <sake@xxxxxxxxxx>
Date: Wed, 11 Aug 2010 12:12:38 +0200
On 10 aug 2010, at 16:48, Jeffs wrote:
> I have come up with the following tshark formula which seems to address my needs.  Since I am not interested in the URLs from advertising agencies, videos and other embedded links in web pages, but only the top level domain I use this.  Please let me know if anyone sees any gotchas or potential problems with this formula I'm very new to regex expressions and could use advice.  This formula will return only the top level domains and strips out links such as admin.brightcove.com, advertisingserver.amazon.com, tubemogel.videos.com:
> 
> tshark -r test.cap -R http.request -T fields -e http.host | sed -e 's/?.*$//' | sed -e 's#^\(.*\)\t\(.*\)$#http://\1\2#' | sort | uniq -c | sort -rn | head -n 300 | sed -n -e '/www/p'

If you're only interested in an overview of visited top-level domains, without caring what the specific hosts and/or URI's were that were visited. You could use something like

tshark -r test.cap -R http.request -T fields -e http.host | sed -e 's/^.*\.\([^.]*\.[^.]*\)$/\1/' | sort | uniq -c | sort -rn | head -n 100

for the top-100 top-level domains (based on individual hits, not user sessions).

Cheers,


Sake