Ethereal-dev: Re: [Ethereal-dev] Performance. Ethereal is slow whenusing largecaptures.

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Newbie <spamcontrol2@xxxxxxxxxxx>
Date: Sat, 15 Nov 2003 01:40:12 -0500
Another quick test, I disabled Ethernet, loaded the trace and refiltered it with "smb" 5 times.

compare_proto_id was still the largest individual contributor, function-wise, at 31.1%/3.09s/382,676,097 calls. g_list_find_custom contributed an additional 5.4%/.54s

It's interesting that the calls to compare_proto_id went down by so much, especially considering I refiltered 5 times (in my previous tests, there were almost exactly twice as many calls to it in the test where I opened+refiltered versus when I just opened)

Let me try recompiling gtk+ with profiling support and see if gprof is able to resolve things better (and I can give you a better picture on . It may take a little while since I usually just have fink compile and install the libraries for me...

Ronnie Sahlberg wrote:

g_list_find_custom() sounds generic enough,
it may be some internal function that search through a hash table or a list
(glib aggregates)
using a callback to ethereal to do thinghs like g_hash_table_lookup() etc.

something inside glib.

can you try to disable the eth protocol and thus disable most of the
dissection of packets and see if it makes any difference in speed?
to see if it is likely it is called from the dissection thing in
add_packet_to_packet_list() or not.
if so, even if it needs to be changed   my idea might be a temporary
workaround  hiding the problem.


----- Original Message ----- From: "Ian Schorr"
Sent: Saturday, November 15, 2003 5:05 PM
Subject: Re: [Ethereal-dev] Performance. Ethereal is slow whenusing
largecaptures.


By far the place I can see it called from the most is
epan/proto.c:find_protocol_by_id(), but I only see that function called
4065724 times (and none of the 5 functions where I see
g_list_find_custom called from appears to be able to call it more than
once), so I must be missing something.

What is g_list_find_custom?

Ian Schorr wrote:

No, that's part of the problem.  gprof is having a heck of a time
identifying the calling function for quite a few of these
operations.   It lists g_list_find_custom, for example, as being a
"spontaneous" call - one where the "identify of the caller of a
function cannot be determined".

I can't even find where g_list_find_custom is defined, though I see
that it's called in several places in epan/proto.c and prefs.c

Ronnie Sahlberg wrote:

Hm I was actually surprised by this.
Good thing is that if it is possible to optimize this function  it
could
give a >20% performance boost
which is nothing to sneeze at.

Together with g_list_find_custom() accounting to ~30% of the time
spent.
Can you see in the data from where this function is called so many
times?


----- Original Message ----- From: "Newbie"
Sent: Saturday, November 15, 2003 4:36 PM
Subject: Re: [Ethereal-dev] Performance. Ethereal is slow whenusing
largecaptures.




Ronnie,

I haven't done a "formal" test, per se, but I ran a couple of quick
tests out of curiosity and thought I'd share the results.

I used a file I had sitting around - a 124,568 packet trace containing
primarily CIFS traffic.

I ran two tests with profiling enabled:  One where I simply opened the
file, then exited the program, and another where I opened the file,
then
refiltered using "smb" as a display filter, then exited the program.

Both tests gave me nearly identical results as far as the most
"costly"
procedures, so I'll concentrate just on the second test.

The function consuming the most time was
epan/proto.c:compare_proto_id,
at 22.9%, 8.14s, and 777,148,074 calls (nearly more calls, I believe,
than to all other functions combined).

add_packet_to_packet_list actually came in at #5, with 7.6%, 2.64s -
including time spend in its child functions (epan_dissect_run, mostly
(5.1%/1.8s), which in turn spent the majority of its time in
dissect_packet (5.1%, 1.71s)).

compare_proto_id appeared to be a called by g_list_find_custom.  It
accounted for an additional 5.9%/2.11s, for a total of 28.8% spent in
this particular function (when added to the time spent in its child
function, compare_proto_id)

proto_tree_add_pi appeared to be more "expensive" (10.0%, 2.82s) than
add_packet_to_packet_list, and proto_tree_add_uint (6.7%, 2.01s).

The test system is an Apple G5 running OS X 10.3, with GTK+ 1.2.10,
Ethereal 0.9.16 with no special GCC flags and

"configure --without-plugins".


Let me know if you want me to do a more "formal" test, and upload the
trace to some shared area (perhaps Gerald wouldn't mind setting up an
area temporarily on ftp.ethereal.com, or do we already have something
like that?).

Ian


Ronnie Sahlberg wrote:



Good point.
I never noticed that the number of sessions affected the refilter
speed.
The type of traffic (some reassembled PDUs take much longer than
others

to


dissect) I know would affect it but never knew the number of sessions
did.

The problem now is that fore VERY large captures, ethereal is
always slow
under all circumstances.
So let us start with just a simple random generic capture and
measure for

it


to try to keep the number of variables low.
(If it is as you say the number of sessions affect it as well,  do
you
mean


the number of TCP sessions or what kind of sessions?
At some point, when the worst performance problem has been addressed

this


would be a very interesting area to look at.
(I could create different synthetic capture files to measure with,

same


number of packets, same payload just different number of sessions)
Make a note that you have observed the number of sessions to possibly

have


an effect on the dissection speed so we dont forget to look at
it furhter down the track
)


I currently belive that during refiltering of a capture, most time
would

be


spent inside file.c/add_packet_to_packet_list().
It would be VERY VERY useful to verify that this assumption is
correct.
I would really like someone to look at gprof data and analyze where
most
time is consumed to either verify my claim
add_packet_to_packet_list()
or to invalidate it.

The thing inside this function I think consumes the most cpu I belive

would


be where we call epan_dissect_run() and perform a full dissection
of the
packet.


As I see it, apart from the initial time we encounter the packet
during

file


read (or live capture) there are not that many instances where we
really
must
dissect the packet at all.
OK. If we select a packet in the list so it gets displayed in the
dissect
pane that might be an exception but that is not something that we do

100.000


times
per capture anyway so the performance of that is irrelevant.
We might also need to do a full rescan/redissect of all packets IF we

have


changed the preferences in such a way that the packets will be
dissected
differently  or when we have changed stuff using  DecodeAs.

However, for me and many other users, the MAIN reason ethereal
rescans
the


packet list is because we have applied or changed a filter. Some
users
will


filter and refilter a capture file over and over and over, ten,
twenty,
thirty if not more times for each capture they work with.
Or see when a ConversationList dislog or a ServiceResponseTime
dialog is
opened.


Well enough of that. To my idea:

Hypothesis:  A significant part of the slowness of ethereal when

refiltering


a capture file comes from the expensive calls to epan_dissect_run()

called


from add_packet_to_packet_list() in file.c

Potential fix: Reduce the number of calls made to
epan_dissect_run() at

the


expense of additional memory requirements (enabled by a preference)

Assuming that most of the time we perform a full rescan/redissect
of the
capture file is when we really just want to reapply a display filter.

(and


are not doing anything that affects how a packet is dissected).

What do we need in order to refilter the packet list  if we do not
allow
calling epan_dissect_run()?
1, We need to remember all COL values for all packets so that we
can just
reapply them when adding the packet to the packetlist without
calling the
dissector and recreating them that way.   This will consume
additional
memory.
2, For every packet we need to keep a list of all the hf_fields
that were
encountered in the packet.
 This list contains the index of the hf variable as well as the
value
it


has.
 Nothing else needs to be stored there (in order to reduce the
impact
on


memory)
 This list may NOT be pruned as the edt structs are. This is because

we


want to be able to still use this list even after the filters have

changed


and thus
the pruning would be different.   No pruning.
The "ApplyFilterToEdtStructure" fucntions would need to be changed
(or
duplicated) so they could operate on the list in 2 instead of the edt
structure.
This function might also need to be looked at so that it would be

efficient


even for very large lists (no pruning)


1 would allow us to rebuild the packet list without needing to call
the
dissector (?)
2 would allow us to refilter the entire trace without calling any
dissectors.

ideas, comments?


Right now it would be nice if someone could create a capture as I

proposed


earlier and use GPROF to check where most of the CPU is spent when
refiltering the capture. To verify if my assumptions are correct or
invalidate them.


(
As a nice benefit in the future, IF we were to have that list of
fields

for


each packet, easily available, we could do things like merging this
list
between packets.
Say #6 is the Call and #27 is the Response.
Since these packets are paired we could merge the lists from these
two
packets into a single one.
Then when searcing for something that occured in the Response
packet, we
would automatically also pick up the matching Call packet sinte their

lists


were merged.
I.e filtering for smb.error==foo   would both find the Response that

barfed


saying foo  but also teh matched Call to this Response.
That would also be useful.
)