Wireshark · Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request

Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request

Date: Wed, 11 Aug 2010 09:06:53 -0400

On 8/11/2010 6:12 AM, Sake Blok wrote:

On 10 aug 2010, at 16:48, Jeffs wrote:

I have come up with the following tshark formula which seems to address my needs.  Since I am not interested in the URLs from advertising agencies, videos and other embedded links in web pages, but only the top level domain I use this.  Please let me know if anyone sees any gotchas or potential problems with this formula I'm very new to regex expressions and could use advice.  This formula will return only the top level domains and strips out links such as admin.brightcove.com, advertisingserver.amazon.com, tubemogel.videos.com:

tshark -r test.cap -R http.request -T fields -e http.host | sed -e 's/?.*$//' | sed -e 's#^\(.*\)\t\(.*\)$#http://\1\2#' | sort | uniq -c | sort -rn | head -n 300 | sed -n -e '/www/p'

If you're only interested in an overview of visited top-level domains, without caring what the specific hosts and/or URI's were that were visited. You could use something like

tshark -r test.cap -R http.request -T fields -e http.host | sed -e 's/^.*\.\([^.]*\.[^.]*\)$/\1/' | sort | uniq -c | sort -rn | head -n 100

for the top-100 top-level domains (based on individual hits, not user sessions).

Cheers,


Sake


___________________________________________________________________________
Sent via:    Wireshark-users mailing list<wireshark-users@xxxxxxxxxxxxx>
Archives:    http://www.wireshark.org/lists/wireshark-users
Unsubscribe: https://wireshark.org/mailman/options/wireshark-users
              mailto:wireshark-users-request@xxxxxxxxxxxxx?subject=unsubscribe

Thank you for your reply. The issue I am having, and which also happenswith the formula you provided, above, is that domains are being reportedthat are links (mostly advertising and graphic-image links) embedded inthe web page which I do not want for they will pollute my results. Ionly want either the domain for the link clicked, or the domain for thelink typed in the browser box. For example, the formula you providedabove returns:


71 nytimes.com
     15 propertyshark.com
     13 fbcdn.net
      5 voicefive.com
      5 2mdn.net
      4 brightcove.com
      2 google-analytics.com
      2 doubleclick.net
      1 yahoo.com
      1 imrworldwide.com
      1 facebook.com

The above doubleclick.net, brightcove.com, 2mdn.net, and fbcdn.netreported domains are for things like advertising links and embeddedlinks in the web page of the landing page for the domain typed orclicked. This is polluting my results.

This formula, however, only returns results minus the links and imagesembedded in the web page:


15 www.propertyshark.com
      8 www.nytimes.com
      2 www.google-analytics.com
      1 www.facebook.com

However, I am new to regex so I'm sure I may be missing something orlosing some links.


Thank you.

Follow-Ups:
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Thierry Emmanuel

References:
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: j.snelders
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Jeffs
- Re: [Wireshark-users] filter for ONLY initial get request
  - From: Sake Blok

Prev by Date: Re: [Wireshark-users] filter for ONLY initial get request
Next by Date: Re: [Wireshark-users] filter for ONLY initial get request
Previous by thread: Re: [Wireshark-users] filter for ONLY initial get request
Next by thread: Re: [Wireshark-users] filter for ONLY initial get request
Index(es):
- Date
- Thread