Wireshark-dev: Re: [Wireshark-dev] Copying TVBs for Reassembly [Was: Filebacked-tvbuffs : GSoC'
From: Evan Huus <eapache@xxxxxxxxx>
Date: Thu, 18 Apr 2013 17:17:17 -0400
On Thu, Apr 18, 2013 at 4:59 PM, Anders Broman <a.broman@xxxxxxxxxxxx> wrote:
> Evan Huus skrev 2013-04-18 22:40:
>
>> On Thu, Apr 18, 2013 at 3:56 PM, Jeff Morriss <jeff.morriss.ws@xxxxxxxxx>
>> wrote:
>>>
>>> On 04/18/13 15:14, Evan Huus wrote:
>>>>
>>>> This is a tangential issue that has always confused me.
>>>>
>>>> Why do we malloc+memcpy data for reassembly when we already have
>>>> 'virtual' composite TVBs?
>>>>
>>>> Wouldn't it be more efficient (in time and memory) to create a
>>>> composite TVB for each reassembly and then build the reassembled
>>>> packet in it? You would never have to copy or allocate any actual
>>>> packet data...
>>>
>>>
>>> There are a couple of problems with doing that (that I recall):
>>>
>>> 1) Composite TVBs don't actually work (or didn't work until very
>>> recently?).
>>>
>>> 2) The data behind a TVB goes away as soon as we're done dissecting (and
>>> displaying) the packet.  That is, the TVB data is overwritten (IIRC) when
>>> the next packet is read.
>>>
>>> I suppose there was never any real reason to try to make reassembly work
>>> with composite TVBs: if they're just more malloc()'d memory then why mess
>>> with it rather than allocate our own copy of the data?  (Well, OK, it
>>> would
>>> save a data copy, but...)
>>
>> OK, so then the optimal case would be a tvb implementation that stored
>> only frame_data pointers, offsets and lengths... similar but not
>> identical to the current composite implementation.
>>
>> The reassembly code could then add meta-data to this when
>> reassembling, and the tvb could lazily refetch the underlying tvbs
>> using the existing wiretap interface? If we add some sort of caching
>> mechanism so that repeated accesses didn't keep forcing reads of the
>> original file then I expect this would be very fast:
>>
>> - adding fragments to reassembly would be near-instantaneous (just a
>> few pointer updates)
>> - reassembled tvbs would take minimal memory except when accessed
>> (using tvb_get_* or proto_tree_add_*)
>> - accessing a reassembled tvb would just be an offset calculation and
>> then a wtap read to bring into memory the underlying real packet(s)
>> containing the data being requested (assuming they aren't already
>> cached)
>>
>> Thoughts?
>
> If on top of that small enough files could be mmaped it'd be even faster.

Yes although I think this could be done entirely separately in wiretap
without touching the reassembly or tvbuff code? It would need a
wiretap API change since right now we pass in a buffer to fill and the
new code would need to return a buffer pointer instead, but other than
that I think it would be fairly unintrusive.