On Tue, Apr 6, 2010 at 10:45 PM, Andrej van der Zee
<andrejvanderzee@xxxxxxxxx> wrote:
> What I would like to know is how to match packets on both ends of the
> line, provided that I have the IP numbers. Are there any unique packet
> identifiers that appear in the cap-files on both ends? What should I
> use? For example, when I study the cap-file in Wireshark, I see under
> "Internet Protocol" an "Identification" number that seems to be
> incremented for packets over the same connection (or conversation?).
> Is this Identification number generated by Wireshark or is it really
> in the packet headers? Does it appear in both cap files? In that case,
> I could use a tuple <IP, Identification> to match packets on both
> ends.
IP IDs are actually in the packet header. Two ways to know: 1) Click
on the field. Notice how 4 bytes (containing the IP ID value) are
highlighted in the bottom pane, the data portion of the packet. 2)
Generally Wireshark-generated fields should be enclosed in square
brackets (though those things aren't necessarily always going to be
the case, they're generally true and are SUPPOSED to always be true)
You could use an ID field like IP ID to identify your packets.
However, IP ID is not only not guaranteed to be unique within your
capture, but it's one of the most likely fields to not be unique. For
one thing, it's a relatively small field (16-bit) and even if a host
increments the ID steadily (i.e. it doesn't re-use IDs more often than
it has to), it will re-use an ID after sending only 65536 packets. A
reasonably busy system is going to wrap IDs pretty quickly.
Incidentally, different hosts increment the IDs differently. Some
increment it globally - once per
packet they send. Some have an incrementing counter for each host
they're talking to. One host that
I was looking at yesterday does some weird things - it seems to be
aware at IP-level what
packets are still in-flight on the network (based on being
unacknowledged at TCP level). It only
generated unique IDs for each in-flight packet, but once they'd
been acknowledged, it'd reuse that
IP ID. So it tended to use IDs 0, 1, and maybe 2 ALL the time. Weird.
And really, that's the tricky part. There aren't really any fields in
TCP/IP packets that are guaranteed to be unique. There's always SOME
chance of miscorrelating two packets that share the same properties
that you're checking for.
Anyway, if you were going to go that route, the TCP sequence number is
probably much better for your you'd get better results by comparing
TCP Sequence number (tcp.seq), plus the 4-tuple (source and
destination IPs and ports). Although it's still not 100% guaranteed
to be unique, it's much, much more likely. Although the numbers will
wrap, it will only happen after 4billion bytes have been transferred.
There's a chance that if connections are being opened and closed
frequently, then port numbers will be reused and that the host will
also start TCP sequence numbers at the same point, but that's also
extremely unlikely (and a big no-no - to avoid attacks hosts typically
should assign a random sequence ID).
Now compare the IP IDs as well, and it's now very, very, VERY unlikely
that two packets you compare match criteria but aren't actually the
same. Theoretically there is a chance of misidentification, but
practically, for your purposes, that's probably plenty accurate.
Keep in mind also that if the network is modifying your packets
enroute, you'll have troubles. If there is a TCP proxy of some kind,
a NAT/PAT device, or even a router that "fragments" packets, it may
seriously impact what you're comparing.