Ethereal-users: Re: [Ethereal-users] HTTP Dissector & reassembler, tethereal, and mirroring a we
Jon Passki wrote:
While doing off-line analysis of some HTTP traffic, I would like to
reconstruct the results back into a webpage. I understand the GUI
has the TCP reassembly [1,2,3], plus the HTTP dissector understands
data such as JPEGs.
"Understands" in the sense that it can dissect the structure of a JPEG
file; it doesn't "understand" it in the sense of being able to display
the image. (Also, the HTTP dissector only "understands" that
"image/jpeg" means that the entity body should be handed to the JPEG
dissector - which it knows because the JPEG dissector has registered
itself with a media type of "image/jpeg".
What I'd like to do is feed a pcap session
into tethereal, reconstruct an HTTP session, and have the HTTP
dissector magically spit out a web page.
To do this seems non-trivial to me, since there might be multiple
TCP sessions for one web page (e.g. a JPEG download).
By "Web page" do you mean "page displayed by a Web browser"? If so,
then that's not really a concept that exists at the HTTP layer, and
thus, it's not really something that the HTTP dissector should be doing.
A tap could perhaps be used to gather together various HTTP entities
that could be considered the components of a Web page, but I'm not sure
what it'd do with them after that. Is there some representation of a
Web page, in that sense, as a single file? If not, what would the tap
in question do with that the components of the page to "spit out a Web
page"?
So, I'd
assume a state machine of some sort. Example: the initial page had
some image src, so the state machine would check to see if there
were any HTTP requests for the link. Then this has the added
difficulty that time would be the only thing to separate multiple
downloads of the same file (JPEG Session 1 was 10 seconds later,
JPEG Session 2 was 60 seconds later, JPEG Session 3 was 120 seconds
later - use JPEG Session 1).
So, does this functionality exist?
No.
If so, what did I miss in reading up on reassembly?
None of that has anything to do with "reassembly" in Ethereal's sense of
the word. "Reassembly", in Ethereal's sense, refers to assembling the
parts of a higher-level packet that are contained in multiple
lower-level packets, e.g. reassembling fragments of a fragmented IP
datagram, reassembling the parts of an HTTP request or reply split
across multiple TCP segments, etc.. There's no notion of a "Web page"
at the HTTP layer or any other protocol layer, so there's no notion of
"reassembly" of a Web page at the protocol layer, so the existing
reassembly code wouldn't help.