Wireshark-dev: Re: [Wireshark-dev] New packet list: Optimize memory usage
From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Sun, 12 Jul 2009 15:46:12 -0700

On Jul 12, 2009, at 3:15 PM, Anders Broman wrote:

(That doesn't say this is the wrong thing to do - I've been advocating
this for a while, and made a version of the GTK 1.2[.x] GtkCList with
"dynamic" column data and prototyped the same thing - it says we need
to make random access to gzipped files faster.)

Did you say at Sharkfest that bzipped files might be better suited for that?
If so perhaps we should go for bzip in stead of gzip?

In bzip2 format, the stream is a sequence of blocks, and, at least as I understand it, each block can be decompressed independently, so seeking to a particular offset in the decompressed version of a bzip2'ed stream involves seeking to the beginning of the block containing the data at that offset, decompressing the block, and then moving to the right offset within the decompressed data. The default, and maximum, block size is 900K.

In gzip/zlib format, the stream is, as far as I know, a sequence of blocks, but the blocks can't be decompressed independently; the dictionary doesn't get reset with each block. That means that you'd either need to decompress the entire file, or save the state of the dictionary periodically, or something such as that to make random access fast.

So bzip2 format would be better as a "native" format; unfortunately, there are gzipped files already out there, and the native compressed format of the Windows sniffer appears to be a gzipped version of the file format (except for the file header), so the gunzipping code could still be useful.