Ethereal-dev: Re: [ethereal-dev] Graphs Patch

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Bibek Sahu <scorpio@xxxxxxxxx>
Date: Wed, 6 Oct 1999 11:12:58 -0500 (CDT)
	Ok, I'm slightly more coherent now.  I suppose I'll take a stab at
describing what I did.  Afterwards, I will provide a critical analysis of
what I did right/wrong; I request that others do the same after taking a
look at the patch.  Note: the user interface (dialog boxes, etc.) is CRAP.
If you have a complaint about it, chances are I already know; you can
either wait for me to fix it (which could take a while) or fix it
yourself.  I recommend the latter option. ;-)

	Also note that all the new files are named eg-*.  Most of the
stuff I put in falls in that category.

	There are really three parts to this: the part the dissector does,
the stat info maintenence functions themselves, and the graphing
functions.

1.) Dissector

	If you look at the changes I made to packet-tcp.c (specifically
dissect_tcp()), you'll notice lines in there such as

if(dport_hist->active) 
	st_hist_item_inc( dport_hist, th.dport, packet_size, ...);

	dport_hist is a pointer to a structure (st_hist_data_t) containing
information relevant to histograms.  It is not /restricted/ to histograms
-- it can be used well for pie charts and anything else that uses a label
and one-or-two sizes -- but it is designed for them.

	Every time a packet is dissected by dissect_tcp, it adds the
number of bytes for that packet to the overall sum and to the sum for
whatever port the packet was destined for.

i.e., all things are initialized to zero, and then:

the first packet received is a 1361-byte packet destined for port 80:
	dport_hist->sum += 1361;
	find_item(dport_hist, 80)->size += 1361;

<<now: dport_hist->sum is '1361' and the size of the item for port 80 is
'1361'>>

the next packet received is a 1000-byte packet destined for port 25:
	dport_hist->sum += 1000;
	find_item(dport_hist, 25)->size += 1000;

<<now: dport_hist->sum is '2361', the size of the item for port 80 is
'1361', and the size of the item for port 25 is '1000'>>

	... etc., etc., etc.  I think you get the idea.  Standard bivariate
data.  Nothing fancy, nothing special.

	This is just manipulating a set of sums (the total sum, the sum for
port 80, the sum for port 25, etc.).  It has ABSOLUTELY NOTHING to do
directly with any sort of graphs.  Graphs use this information, but this
structure does not have any idea whether anything is currently using it; all
it knows is that somebody wants it to be actively maintained, so it
maintains the numbers.  Whether anybody uses those numbers is irrelevant.

2.) Statistical summary functions

	The stat summary functions are nothing more than the set of
functions used to maintain this information.  Some of them are essential,
such as stat_hist_register() (which registers a new item that some dissector
will be keeping track of); others are just convenience functions, such as
st_hist_find_item_inc() (which finds the item for port "80" in the structure
for "Destination Ports" and increments relevant counters by the size of the
packet).

	These functions all work towards the goal of keeping track of
statistical information in a centralized place.  They do not create graphs,
they do not maintain graphs, they do not care if a graph even exists.  They
are /strictly/ device-independent.

	The functions are organized into 3 layers:

	layer 1: generic functions

		Strictly speaking, these functions have nothing to do with
statistics and are sometimes used by things that are not statistical in any
way.  The only reason they are found in these files is because I had nowhere 
better to put them.

		The functions in question are generic_register() and
generic_unregister().  Considering how simple and basic they are, they could
probably be eliminated altogether; they are left in place just in case
anyone chooses to do something that they want applied to /all/
registrations.  It keeps things centralized.

		They are currently used by stat_register() and
graph_register().

	layer 2: general statistical functions

		Although histogram data is the only kind currently
supported, it is possible to implement other kinds of data.

		These functions provide an abstraction layer so that things
that need to work with all statistical information need not know or care
what types exist.  Take for example stat_clear_all(), which clears all the
data in all registered items: this is run just before a file is loaded.  The
file-loading initializes relevant things, so it needs to do this, but it
shouldn't know or care about what types of information is being stored.

		This layer also provides functions used by higher layers,
such as the data-type (e.g. histogram) layer.  For example, it provides a
function to compare the ids of two data items and return -1, 0, or 1 (for
use in sorting).

		There are also structures here that correspond to higher-
level structures, i.e., the generic stat data and generic stat data item
types.  The former is used by things like the graph chooser, since it only
cares about labels (which should be there whether you're using histogram
data, cross-graph data, or whatever).  The latter is used by things like
st_generic_cmp(), which compares to data items based on their id.

	layer 3: data-type layer (e.g., histogram info)

		This layer is where the meat is.  These functions manipulate
data in a way that is directly meaningful to the data-type.

		st_hist_new() creates and fills in a new structure
containing histogram data.  st_hist_item_inc() updates the sums for the
data.  Etc., etc., etc.  Currently everything manipulates histogram data;
when cross-graph data is supported, its functions will fall in this layer.

		In general, these are the functions that are used both by
the dissectors -- to manipulate data -- and the graphs -- to retrieve data.

		I also consider the histogram data structures to be in this
layer, even though they're not functions.  The dissectors may use
convenience functions where the graphs read the data directly, but it's the
same principle.


3.) Graphs

	The last grouping is of the graphs.  They're not really layered, per
se, but they are arranged by how they display data.  Since bivariate
histogram data is the only thing that's graphed right now, there is
currently only one group. ;-)

	The graphs read the data from the statistics system and display them
in a way befitting the data.  As there's currently only support for
histogram data, it displays that; as there's currently only a single-bar
histogram, it displays a single-bar histogram.  In the near future, there
will be double-bar histogram (e.g., for source port on the left; dest port
on the right) which will work with the same data.  Pie charts can also work
from the same data, as can many others.

	Strictly speaking, there are two layers here: one is the actual
graph drawing layer, the other is a generic graph control layer.  The former
is described above, the latter consists of things like graph_update_all()
[called after a file is loaded], graph_start_all() [which should be called
after starting a live capture], and graph_stop_all() [which should be called
after stopping a live capture].

	NOTE: Graphs are not updated during (or after) a live capture right
now.  Implementing this should be as simple as calling graph_start_all()
just before or immediately after starting a live capture, and calling
graph_stop_all() just after ending a live capture.  I just haven't hunted
down yet where live captures are started and stopped.  If anyone wants to do
this, I'd love to hear it. ;-)

4.) User-Interface

	Last but not least, let us not forget the user interface.  Well,
considering it's crap, maybe we should forget it and start over.

	This consists of things like the dialog boxes used to select graphs,
etc.  It needs help.  I'll leave out the details for now.

	Th-th-th-th-th-that's all, folks!

--------

	Once I had everything done, I realized I had to add something to
maintain total info on the source and destination ports together.  I
thought about what needed to be done, and I almost jumped for joy -- With
about ten lines of code, I added a summary type and had dissect_tcp parse
it.  Everything else was automatically taken care of by the other layers;
I didn't have to touch a thing. That's the kind of flexibility I was aiming
for.  It seems that I have [more or less] achieved it.  :-)

	I'm going to put off the critical analysis for now, since I'm tired
of writing.  Suffice it to say, hist_data_t should also include a 'sum2'
element, the single-bar graph should add the sums for 1-and-2 everything
before drawing, and tcp_dissect should set 0's on that second parameter
except with the src+dest. combo.

	I hope you enjoyed reading that as much as I enjoyed writing it.

	Actually, I hope you enjoyed reading that much /more/ than I enjoyed
writing it.  If you enjoyed it only as much, I suggest you stay away from
sharp objects and projectile weapons for a while.

	This ship will self-destruct in 3 minutes... 2... 1... 0.  Have a
nice day! :-)

- Bibek


On Wed, 6 Oct 1999, Bibek Sahu wrote:

> 
> 	And here's the patch.  I sent a message recently with
> detail/explanations.  It was probably incoherent, but I'm not much more
> coherent now than I was when I wrote that message, so I'll just leave it at
> that.
> 
> 	I'll explain this sometime around the weekend.  Hopefully between
> that message and this patch, things will start to make sense.
> 
> - Bibek
> ... who would /really/ rather not think about NFAs and Regular Expressions
> for a while... <sigh>
>