Ethereal-users: Re: [Ethereal-users] HTTP Dissector & reassembler, tethereal, and mirroring a we
--- Guy Harris <gharris@xxxxxxxxx> wrote:
> Jon Passki wrote:
>
> > While doing off-line analysis of some HTTP traffic, I would
> like to
> > reconstruct the results back into a webpage. I understand the
> GUI
> > has the TCP reassembly [1,2,3], plus the HTTP dissector
> understands
> > data such as JPEGs.
>
> "Understands" in the sense that it can dissect the structure of a
> JPEG
> file; it doesn't "understand" it in the sense of being able to
> display
> the image. (Also, the HTTP dissector only "understands" that
> "image/jpeg" means that the entity body should be handed to the
> JPEG
> dissector - which it knows because the JPEG dissector has
> registered
> itself with a media type of "image/jpeg".
Is it correct to say that the HTTP dissector might call other
dissectors based upon the media type encountered in an HTTP
session? Is there a listing of available dissectors (outside of
code)?
> > What I'd like to do is feed a pcap session
> > into tethereal, reconstruct an HTTP session, and have the HTTP
> > dissector magically spit out a web page.
> >
> > To do this seems non-trivial to me, since there might be
> multiple
> > TCP sessions for one web page (e.g. a JPEG download).
>
> By "Web page" do you mean "page displayed by a Web browser"? If
> so,
> then that's not really a concept that exists at the HTTP layer,
> and
> thus, it's not really something that the HTTP dissector should be
> doing.
By a web page, I mean a hierachal representation of the media type
data (e.g. HTML [text/html], JPEG [image/jpeg], etc) within the
HTTP session. I see now that it probably wouldn't make sense in
the HTTP dissector. Perhaps this could be a feature on exporting
the data? E.g.,when a JPEG is exported from an HTTP session,
somewhere (filename, companion file, directory structure, whatever)
there is information that I can use to associate it to a larger
group of sessions. This could be the absolute URI or absolute path
and Host field, time & date, and/or whatever else makes sense.
> A tap could perhaps be used to gather together various HTTP
> entities
> that could be considered the components of a Web page, but I'm
> not sure
> what it'd do with them after that. Is there some representation
> of a
> Web page, in that sense, as a single file? If not, what would
> the tap
> in question do with that the components of the page to "spit out
> a Web
> page"?
I hopefully answered this above. For me, it does not necessarily
need to create an HTML document that can be easily loaded in a web
browser, but that would be ideal. If there were for example three
HTTP sessions created by a web browser to render some HTML page, I
would like to have those sessions exported and grouped together
somehow so I would know that they're logically connected. An added
bonus is that I could use Firefox to review the data, but that's
not necessary.
> > So, I'd
> > assume a state machine of some sort. Example: the initial page
> had
> > some image src, so the state machine would check to see if
> there
> > were any HTTP requests for the link. Then this has the added
> > difficulty that time would be the only thing to separate
> multiple
> > downloads of the same file (JPEG Session 1 was 10 seconds
> later,
> > JPEG Session 2 was 60 seconds later, JPEG Session 3 was 120
> seconds
> > later - use JPEG Session 1).
> >
> > So, does this functionality exist?
>
> No.
>
> > If so, what did I miss in reading up on reassembly?
>
> None of that has anything to do with "reassembly" in Ethereal's
> sense of
> the word. "Reassembly", in Ethereal's sense, refers to
> assembling the
> parts of a higher-level packet that are contained in multiple
> lower-level packets, e.g. reassembling fragments of a fragmented
> IP
> datagram, reassembling the parts of an HTTP request or reply
> split
> across multiple TCP segments, etc.. There's no notion of a "Web
> page"
> at the HTTP layer or any other protocol layer, so there's no
> notion of
> "reassembly" of a Web page at the protocol layer, so the existing
>
> reassembly code wouldn't help.
Gotcha. Didn't think things properly through (hopefully did now).
Here's an example scenario:
In Quality Assurance (QA) testing we automate 100 tests against a
systems, with 15 being HTTP related. The logic about HTTP is
primative and to add logic or change the testing tools is currently
not possible (if I could, this is where I would start). The
responses aren't captured, just a brief pass / fail type messages.
The traffic, though, is captured in pcap format. When doing some
verification of the tests, we need to look at the pcap dump to see
what really came across since the output test data is useless.
Since all I have is pcap I'm looking at tcpflow, {t}ethereal, and
whatever other tool that can reassemble the TCP, HTTP, and whatever
basic media types that may be included in the HTTP session.
Thanks again for your time on this,
Jon
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com