Wireshark-bugs: [Wireshark-bugs] [Bug 6794] New: slow voip/sip flow chart for large captures
Date: Mon, 6 Feb 2012 05:24:42 -0800 (PST)
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6794

           Summary: slow voip/sip flow chart for large captures
           Product: Wireshark
           Version: SVN
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Medium
         Component: Wireshark
        AssignedTo: bugzilla-admin@xxxxxxxxxxxxx
        ReportedBy: const.crist@xxxxxxxxxxxxxx


Build Information:
wireshark 1.7.1 (SVN Rev 40740 from /trunk)

Copyright 1998-2012 Gerald Combs <gerald@xxxxxxxxxxxxx> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with GTK+ 2.24.4, with Cairo 1.10.2, with Pango 1.28.4, with
GLib 2.28.6, with libpcap, with libz 1.2.3.4, with POSIX capabilities (Linux),
with SMI 0.4.8, with c-ares 1.7.5, with Lua 5.1, without Python, with GnuTLS
2.12.10, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP, with PortAudio
V19-devel (built Jul 20 2011 00:01:38), without AirPcap.

Running on Linux 3.1.0-1-amd64, with locale en_US.UTF-8, with libpcap version
1.1.1, with libz 1.2.3.4, GnuTLS 2.12.10, Gcrypt 1.5.0.

Built using gcc 4.6.1.
--
wireshark can draw call flows for sip voip calls
(accessible through the menu Telephony/VoIP Calls).

however, when the capture is large, containing tens of
thousands of sip voip calls, wireshark becomes very slow
at rendering the call flows.

here are my experiences with a capture file of 500 MB containing
approx. 40K calls on a debian machine having:

4xAMD Phenom(tm) II X4 940 Processor
4GB RAM

with the svn stock it takes about 10 hours to do it; the
report of the profiling done with oprofile looks like this:

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples  %        image name               app name                 symbol name
8146520  55.2011  lt-wireshark             lt-wireshark            
append_to_frame_graph.isra.7
3698478  25.0610  libglib-2.0.so.0.2800.6  lt-wireshark             g_list_last
1566933  10.6176  lt-wireshark             lt-wireshark            
SIPcalls_packet
1162377   7.8763  libc-2.13.so             lt-wireshark            
__strcmp_sse2
8734      0.0592  libwireshark.so.0.0.1    lt-wireshark            
serv_name_lookup
8130      0.0551  libwireshark.so.0.0.1    lt-wireshark            
dissect_sip_common
6504      0.0441  libwireshark.so.0.0.1    lt-wireshark             format_text
5749      0.0390  libwireshark.so.0.0.1    lt-wireshark            
tap_push_tapped_queue
4753      0.0322  libglib-2.0.so.0.2800.6  lt-wireshark            
g_unichar_isprint
4627      0.0314  libglib-2.0.so.0.2800.6  lt-wireshark            
g_hash_table_lookup
3699      0.0251  libwireshark.so.0.0.1    lt-wireshark             guint8_pbrk

step 1
------

by changing SIPcalls_packet() to use a hash table for the lookup of the calls
(on sip call-id) the runtime decreases to an (estimated) 3-4 hours.
SIPcalls_packet()'s cpu usage has decreased from 10.61% to 0.0115% (1000 times
faster)

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
411305   69.4523  lt-wireshark             append_to_frame_graph.isra.7
171549   28.9675  libglib-2.0.so.0.2800.6  g_list_last
475       0.0802  libwireshark.so.0.0.1    serv_name_lookup
439       0.0741  libwireshark.so.0.0.1    dissect_sip_common
375       0.0633  libwireshark.so.0.0.1    format_text
360       0.0608  libwireshark.so.0.0.1    tap_push_tapped_queue
303       0.0512  libglib-2.0.so.0.2800.6  g_hash_table_lookup
283       0.0478  libglib-2.0.so.0.2800.6  g_unichar_isprint
202       0.0341  libwireshark.so.0.0.1    guint8_pbrk
172       0.0290  libwireshark.so.0.0.1    proto_tree_add_item
149       0.0252  libwireshark.so.0.0.1    compute_offset_length
143       0.0241  libc-2.13.so             __memcpy_sse2
141       0.0238  libwireshark.so.0.0.1    call_dissector_work
141       0.0238  libwireshark.so.0.0.1    emem_free_all
115       0.0194  libc-2.13.so             __strncmp_sse2
112       0.0189  libc-2.13.so             vfprintf
105       0.0177  libwireshark.so.0.0.1    proto_item_add_subtree
104       0.0176  libglib-2.0.so.0.2800.6  g_slice_free_chain_with_offset
99        0.0167  libwireshark.so.0.0.1    dissect_sdp
91        0.0154  libc-2.13.so             free
91        0.0154  libwireshark.so.0.0.1    dissect_ip
90        0.0152  libwireshark.so.0.0.1    proto_field_is_referenced
90        0.0152  libwireshark.so.0.0.1    tvb_length_remaining
90        0.0152  lt-wireshark            
process_specified_packets.constprop.10
89        0.0150  libglib-2.0.so.0.2800.6  g_slice_free1
89        0.0150  libwiretap.so.0.0.1      file_seek
88        0.0149  libwireshark.so.0.0.1    dissect
86        0.0145  libwireshark.so.0.0.1    dissect_frame
82        0.0138  libwireshark.so.0.0.1    proto_is_protocol_enabled
79        0.0133  libc-2.13.so             __strcmp_sse2
79        0.0133  libc-2.13.so             _int_malloc
78        0.0132  libglib-2.0.so.0.2800.6  g_slice_alloc
75        0.0127  libpthread-2.13.so       pthread_getspecific
73        0.0123  libwireshark.so.0.0.1    tvb_find_guint8
72        0.0122  libc-2.13.so             malloc
71        0.0120  libwireshark.so.0.0.1    fast_ensure_contiguous
68        0.0115  lt-wireshark             SIPcalls_packet

step 2
------

the next improvment was to change append_to_frame_graph() to use a
hash table for graph analysis (still slow, over 1 hour):

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
565268   94.6185  libglib-2.0.so.0.2800.6  g_list_last
1863      0.3118  libwireshark.so.0.0.1    serv_name_lookup
1620      0.2712  libwireshark.so.0.0.1    dissect_sip_common
1340      0.2243  libwireshark.so.0.0.1    format_text
987       0.1652  libglib-2.0.so.0.2800.6  g_unichar_isprint
982       0.1644  libglib-2.0.so.0.2800.6  g_hash_table_lookup
819       0.1371  libwireshark.so.0.0.1    tap_push_tapped_queue
731       0.1224  libwireshark.so.0.0.1    guint8_pbrk
628       0.1051  libwireshark.so.0.0.1    proto_tree_add_item
587       0.0983  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
574       0.0961  libwireshark.so.0.0.1    compute_offset_length
405       0.0678  libwireshark.so.0.0.1    proto_is_protocol_enabled
401       0.0671  libwireshark.so.0.0.1    emem_free_all
395       0.0661  libwireshark.so.0.0.1    dissect_ip
391       0.0654  libc-2.13.so             __strncmp_sse2
367       0.0614  libwireshark.so.0.0.1    dissect
...
20        0.0017  lt-wireshark             append_to_frame_graph.isra.8

however, append_to_frame_graph() got from 69.45% to 0.0017% (40852 times
better)

step 3
------

I have changed the graph analysis list and the callsinfo list to use
prepend instead of append:

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
130285   18.6875  libc-2.13.so             msort_with_tmp
66464     9.5333  libglib-2.0.so.0.2800.6  node_get_pos
53790     7.7154  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
51146     7.3361  libc-2.13.so             __memcpy_sse2
44419     6.3712  libglib-2.0.so.0.2800.6  node_get_next
42461     6.0904  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_fixup
41265     5.9189  libglib-2.0.so.0.2800.6  is_end
40819     5.8549  libglib-2.0.so.0.2800.6  g_hash_table_resize
38905     5.5803  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_next
36069     5.1736  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_reorder
35828     5.1390  libglib-2.0.so.0.2800.6  g_hash_table_lookup
29469     4.2269  libgtk-x11-2.0.so.0.2400.4 save_positions
26858     3.8524  libgtk-x11-2.0.so.0.2400.4 generate_order
7892      1.1320  libc-2.13.so             __memset_sse2
7752      1.1119  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_sort_func
6860      0.9840  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_invert_func
6835      0.9804  libglib-2.0.so.0.2800.6  g_hash_table_remove_all_nodes
4085      0.5859  libglib-2.0.so.0.2800.6  g_array_append_vals
3864      0.5542  libglib-2.0.so.0.2800.6  g_sequence_iter_is_end
3591      0.5151  libc-2.13.so             __memset_x86_64

2057      0.2950  libglib-2.0.so.0.2800.6  g_array_maybe_expand

step 4
------

reordering is performed _continously_ while adding elements to the voip calls
list.
(by _gtk_rbtree_reorder which is calling qsort from libc which is calling
msort_with_tmp)

after disabling continous reordering, by using in voip_calls_dlg_update():

        /* disable the re-ordering */
        gtk_tree_sortable_set_sort_column_id(sortable,
GTK_TREE_SORTABLE_UNSORTED_SORT_COLUMN_ID, GTK_SORT_ASCENDING);

        listx = g_list_first(listx);
        while (listx) {
            add_to_list_store((voip_calls_info_t*)(listx->data));
            listx = g_list_next(listx);
        }

        /* enable the re-ordering */
        gtk_tree_sortable_set_sort_column_id(sortable, CALL_COL_START_TIME,
GTK_SORT_ASCENDING);

CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples  %        image name               symbol name
13495     8.8003  libwireshark.so.0.0.1    format_text
10698     6.9763  libglib-2.0.so.0.2800.6  g_unichar_isprint
7421      4.8394  libwireshark.so.0.0.1    guint8_pbrk
4973      3.2430  libwireshark.so.0.0.1    compute_offset_length
4776      3.1145  libglib-2.0.so.0.2800.6  g_hash_table_lookup
4029      2.6274  libwireshark.so.0.0.1    dissect_sip_common
3851      2.5113  libglib-2.0.so.0.2800.6  g_atomic_pointer_get
3186      2.0776  libc-2.13.so             __memcpy_sse2
3064      1.9981  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
3008      1.9616  libwireshark.so.0.0.1    emem_free_all
2964      1.9329  libwireshark.so.0.0.1    fast_ensure_contiguous
2803      1.8279  libwireshark.so.0.0.1    serv_name_lookup
2793      1.8214  libwireshark.so.0.0.1    check_offset_length_no_exception
2114      1.3786  libc-2.13.so             _int_malloc
2063      1.3453  libglib-2.0.so.0.2800.6  g_slice_free1
2046      1.3342  libwireshark.so.0.0.1    tap_push_tapped_queue
1841      1.2005  libwireshark.so.0.0.1    tvb_get_guint8
1713      1.1171  libc-2.13.so             memchr
1668      1.0877  libwireshark.so.0.0.1    tvb_length_remaining
1625      1.0597  libglib-2.0.so.0.2800.6  g_atomic_int_get
1550      1.0108  libwireshark.so.0.0.1    proto_tree_add_item
1524      0.9938  libc-2.13.so             vfprintf
1475      0.9619  libwireshark.so.0.0.1    emem_alloc_chunk
1439      0.9384  libc-2.13.so             __strncmp_sse2
1395      0.9097  libglib-2.0.so.0.2800.6  g_slice_alloc
1312      0.8556  libwireshark.so.0.0.1    tvb_find_guint8
1279      0.8341  libwireshark.so.0.0.1    ensure_contiguous_no_exception
1052      0.6860  libc-2.13.so             malloc
1028      0.6704  libwireshark.so.0.0.1    dissect_sdp
1022      0.6665  libwireshark.so.0.0.1    tvb_pbrk_guint8

the runtime decreased now to approx. 1 minute.

the changes/tests have been explicitly targeted at sip voip calls.
some of the data structures are global and used by _all_
voip protocol taps; I do NOT know whether it works correctly with dumps 
having also non-sip voip calls.

pls. let me know your oppinion.

-- 
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.