Ethereal-users: RE: [Ethereal-users] duplicate packet removal

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

Date: Mon, 17 Mar 2003 20:31:25 -0000
Title: Message
John,
 
>  I am assuming that frames are unique, and a retransmission will be different to the original packet.
 
I'm not so sure this assumption is correct. Try tracing a web request to a server that doesn't talk port 80. You'll probably see three outbound SYN  frames  (and three RSTs) .  These frames will be identical (including IP and TCP checksums).   There are also numerious services running on my local network (esp MAC layer protocols such as ARP) which contiually squirt the same or similar frames onto the LAN all the time. This may or may not be important to you but goes some way to explains why this is a non-tirivial task.
 
> It might be a simple thing to do to modify the code to keep a circular buffer of src, dst, CRC
> and then scan it for a duplicates before allowing a packet to be merged into the output file.
> The length of the buffer will be the correlation window for the duplicate packet check.
 
> Read from input files
>    Pick most recent packet
>    Check against buffer
>    If in buffer, then ignore
>    If not in buffer
>        put src, dst, CRC into buffer
>        write to output file
> Loop until all input files read
 
Even if you are confindent the duplicates listed above are not a problem for you. The algorithm will need a little more work.  What is going to prompt you to shift frames out of your buffer (size, time, # of frames?) . Also, the TCP layer checksum is only two bytes long (=65536 possible checksums). This gets you into the shared birthday problem  
 http://www.cut-the-knot.com/do_you_know/coincidence.shtml . There's a 50% chance of a duplicate checksum every 301 or so packets.  So you'd need to compare a little more than the TCP checksum. Depending on how large your trace is I'd probably opt for storing the whole frame and comparing the whole packet.
 
Nastiest of all IMHO you need to make a decision on the frame's timestamp.(Again maybe this isn't important to you...)  Which  all leads back to my last question:
>> It may be easier to advise you if we understood what sort of packets were being duplicated? Eg is it MAC 
>> level broadcasts, RIP updates etc, HSRP hellos ?   
 
You say you are running "capture processes on one system". Does this mean
1) you're tracing two NICs on the same machine,
2) the same NIC with different filters,
3) or your're running the same filter on the same NIC with different trace start times.
 
If the answer is #3 (and depending on the filter #2 too) you will capture exactly the same frames in exactly the same order with the same relative time stamps, so your algorithm will be fine, in fact you needn't worry about a buffer at all. For anything else you'll need to put your thinking hat on. :-)
 
That having been said I'm quite tempted to see if I can write something that would do this sort thing.At least until I convince myself it really is too tricky a problem, :-)
 
Cheers,
 
Alistair


Registered Office:
Marks & Spencer p.l.c
Michael House, Baker Street,
London, W1U 8EP
Registered No. 214436 in England and Wales.

Telephone (020) 7935 4422
Facsimile (020) 7487 2670

www.marksandspencer.com

Please note that electronic mail may be monitored.

This e-mail is confidential. If you received it by mistake, please let us know and then delete it from your system; you should not copy, disclose, or distribute its contents to anyone nor act in reliance on this e-mail, as this is prohibited and may be unlawful.

The registered office of Marks and Spencer Financial Services PLC, Marks and Spencer Unit Trust Management Limited, Marks and Spencer Life Assurance Limited and Marks and Spencer Savings and Investments Limited is Kings Meadow, Chester, CH99 9FB.