Wireshark-users: Re: [Wireshark-users] filter for ONLY initial get request
From: Jeffs <jeffs@xxxxxxxxxxxxx>
Date: Fri, 13 Aug 2010 18:12:24 -0400

Any idea on where to start? :-[

I'm not great on bash scripting. What would be helpful to know is this and maybe you could provide me guidance since you understand the structure of .cap files better than I do:

Would the flow of the script be to loop through the .cap file pulling out data that has GET request and the associated domain, then, while in that part of the loop, pull back the http.content_type value, and if that value="text/html" output the domain name it just saved, to a separate file?

Then move onto the next line?

Do you think that logic would work?

To be honest, I don't really know the structure of a capture file. My use of Wireshark consists mainly in the use of the GUI (and some other strange things;)  ).

Anyway, what you present doesn't seems illogic to me. I would precise :
- Filter packets with a filter http.request, displaying the tcp.stream field
- Iterate through those lines
- At each line, reload the file using a filter http.content_type=="text/html"&&  tcp.stream==streamid to get the link

But you can also make it by the other direction :
- Filter using http.content_type=="text/html", displaying the tcp.stream field
- Iterate and backup every tcp.stream
- Filter again using http.request, displaying the tcp.stream field
- Iterate and get every domain corresponding to a tcp.stream that you have previously backed up
With this logic, you have a great algorithmic gain.

I suggest you to ask your question to the list, not to me directly.

Best regards.
I cannot find the expression tcp.stream in the wireshark expression list.