Wireshark-dev: Re: [Wireshark-dev] slow sip voip flow for large captures
From: Cristian Constantin <const.crist@xxxxxxxxxxxxxx>
Date: Mon, 6 Feb 2012 14:34:15 +0100
cristian: done. here is the bug report:

https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6794

On Sat, Feb 4, 2012 at 12:43 PM, Anders Broman <a.broman@xxxxxxxxxxxx> wrote:
> Please add your patch as it is so we can have a look at it.
> Regards
> Anders
>
> Cristian Constantin skrev 2012-02-03 15:19:
>
>> On Fri, Feb 3, 2012 at 3:02 PM, Anders Broman
>> <anders.broman@xxxxxxxxxxxx>  wrote:
>>>
>>> Hi,
>>> Forget about my comment about the sip hash table.
>>>
>>> Please add the patch(es) to bugzilla, if possible break it up to
>>> something we can commit straight away and stuff that needs to be worked on
>>> to work
>>> For all taps as that can't be committed until it's done for all tap. Two
>>> bug reports?
>>
>> cristian: but this means really 2 different patches which I have to
>> test separately, right?
>> otoh:
>>
>> * the append operation on lists with over 10k elements is really cpu
>> intensive (imo it
>> does NOT actually make sense to iterate over 10k pointers just to get
>> to the end of
>> some linked list)
>>
>> * if one does NOT apply that patch (step 3),
>> one cannot actually see the actual improvement.
>>
>> I will open one bug report and post the results/patches there.
>>
>>> Regards
>>> Anders
>>> -----Original Message-----
>>> From: wireshark-dev-bounces@xxxxxxxxxxxxx
>>> [mailto:wireshark-dev-bounces@xxxxxxxxxxxxx] On Behalf Of Cristian
>>> Constantin
>>> Sent: den 3 februari 2012 14:54
>>> To: Developer support list for Wireshark
>>> Subject: Re: [Wireshark-dev] slow sip voip flow for large captures
>>>
>>> On Fri, Feb 3, 2012 at 1:44 PM, Anders Broman<anders.broman@xxxxxxxxxxxx>
>>>  wrote:
>>>>
>>>> Hi,
>>>> Please put the patches in bugzilla, if possible split them in two
>>>> parts one that's ok for all types(Step 2,3,4?) of calls and one which
>>>> Implements the stuff for SIP but not for the other call types.
>>>
>>> cristian: step 1 is dealing only with sip packets.
>>>
>>> step 2's improvement could be used all voip taps.
>>>
>>> step 3 is tricky since the lists have to be reversed (trivial operation
>>> for double linked lists; glib offers an api for this). however currently the
>>> prepend and reversing is performed only for sip calls. extra-work to be done
>>> here for other voip taps to use more efficient linked lists operations
>>> (prepend and eventl. reverse). also, this step modifies the callback for the
>>> button "Flow" of the "VoIP" window  - on_graph_bt_clicked() - to reverse
>>> (once) the lists:
>>>
>>> * voip_calls_get_info()->callsinfo_list
>>> * voip_calls_get_info()->graph_analysis->list
>>>
>>> so, in order to work correctly with the other taps, they have to use the
>>> "prepend" operation/convention as well.
>>>
>>> step 4 could be used for all taps
>>>
>>> let me see what I can do about the patches.
>>>
>>>> I think the SIP dissector keeps a table over the calls as well could
>>>> that table be shared?
>>>
>>> cristian: I do not know which table is that.
>>>
>>>> Regards
>>>> Anders
>>>>
>>>> -----Original Message-----
>>>> From: wireshark-dev-bounces@xxxxxxxxxxxxx
>>>> [mailto:wireshark-dev-bounces@xxxxxxxxxxxxx] On Behalf Of Cristian
>>>> Constantin
>>>> Sent: den 3 februari 2012 13:01
>>>> To: Developer support list for Wireshark
>>>> Subject: [Wireshark-dev] slow sip voip flow for large captures
>>>>
>>>> hi!
>>>>
>>>> wireshark can draw call flows for sip voip calls (accessible through the
>>>> menu Telephony/VoIP Calls).
>>>>
>>>> however, when the capture is large, containing tens of thousands of sip
>>>> voip calls, wireshark becomes very slow at producing the list of calls and
>>>> the call flows.
>>>>
>>>> here are my experiences with a capture file of 500 MB containing approx.
>>>> 40K calls on a debian machine having:
>>>>
>>>> 4xAMD Phenom(tm) II X4 940 Processor
>>>> 4GB RAM
>>>>
>>>> with the svn stock it takes about 10 hours to do it; the report of the
>>>> profiling done with oprofile looks like this:
>>>>
>>>> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted
>>>> CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
>>>> mask of 0x00 (No unit mask) count 750000 samples  %        image name
>>>> app name                 symbol name 8146520  55.2011  lt-wireshark
>>>> lt-wireshark
>>>> append_to_frame_graph.isra.7
>>>> 3698478  25.0610  libglib-2.0.so.0.2800.6  lt-wireshark
>>>> g_list_last
>>>> 1566933  10.6176  lt-wireshark             lt-wireshark
>>>> SIPcalls_packet
>>>> 1162377 7.8763  libc-2.13.so             lt-wireshark
>>>> __strcmp_sse2
>>>> 8734      0.0592  libwireshark.so.0.0.1    lt-wireshark
>>>> serv_name_lookup 8130      0.0551  libwireshark.so.0.0.1
>>>> lt-wireshark dissect_sip_common
>>>> 6504      0.0441  libwireshark.so.0.0.1    lt-wireshark
>>>> format_text
>>>> 5749      0.0390  libwireshark.so.0.0.1    lt-wireshark
>>>> tap_push_tapped_queue
>>>> 4753      0.0322  libglib-2.0.so.0.2800.6  lt-wireshark
>>>> g_unichar_isprint
>>>> 4627      0.0314  libglib-2.0.so.0.2800.6  lt-wireshark
>>>> g_hash_table_lookup
>>>> 3699      0.0251  libwireshark.so.0.0.1    lt-wireshark
>>>> guint8_pbrk
>>>>
>>>> step 1
>>>> ------
>>>>
>>>> by changing SIPcalls_packet() to use a hash table for the lookup of the
>>>> calls (on sip call-id) the runtime decreases to an (estimated) 3-4 hours.
>>>> SIPcalls_packet()'s cpu usage has decreased from 10.61% to 0.0115%
>>>> (1000 times faster)
>>>>
>>>> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted
>>>> CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
>>>> mask of 0x00 (No unit mask) count 750000 samples  %        image name
>>>> symbol name
>>>> 411305   69.4523  lt-wireshark
>>>> append_to_frame_graph.isra.7
>>>> 171549   28.9675  libglib-2.0.so.0.2800.6  g_list_last
>>>> 475       0.0802  libwireshark.so.0.0.1    serv_name_lookup
>>>> 439       0.0741  libwireshark.so.0.0.1    dissect_sip_common
>>>> 375       0.0633  libwireshark.so.0.0.1    format_text 360
>>>> 0.0608  libwireshark.so.0.0.1    tap_push_tapped_queue
>>>> 303       0.0512  libglib-2.0.so.0.2800.6  g_hash_table_lookup
>>>> 283       0.0478  libglib-2.0.so.0.2800.6  g_unichar_isprint
>>>> 202       0.0341  libwireshark.so.0.0.1    guint8_pbrk
>>>> 172       0.0290  libwireshark.so.0.0.1    proto_tree_add_item
>>>> 149       0.0252  libwireshark.so.0.0.1    compute_offset_length
>>>> 143       0.0241  libc-2.13.so             __memcpy_sse2
>>>> 141       0.0238  libwireshark.so.0.0.1    call_dissector_work
>>>> 141       0.0238  libwireshark.so.0.0.1    emem_free_all
>>>> 115       0.0194  libc-2.13.so             __strncmp_sse2
>>>> 112       0.0189  libc-2.13.so             vfprintf
>>>> 105       0.0177  libwireshark.so.0.0.1    proto_item_add_subtree
>>>> 104       0.0176  libglib-2.0.so.0.2800.6
>>>> g_slice_free_chain_with_offset
>>>> 99        0.0167  libwireshark.so.0.0.1    dissect_sdp
>>>> 91        0.0154  libc-2.13.so             free
>>>> 91        0.0154  libwireshark.so.0.0.1    dissect_ip 90        0.0152
>>>> libwireshark.so.0.0.1    proto_field_is_referenced 90        0.0152
>>>> libwireshark.so.0.0.1    tvb_length_remaining 90        0.0152
>>>> lt-wireshark process_specified_packets.constprop.10
>>>> 89        0.0150  libglib-2.0.so.0.2800.6  g_slice_free1
>>>> 89        0.0150  libwiretap.so.0.0.1      file_seek
>>>> 88        0.0149  libwireshark.so.0.0.1    dissect
>>>> 86        0.0145  libwireshark.so.0.0.1    dissect_frame
>>>> 82        0.0138  libwireshark.so.0.0.1    proto_is_protocol_enabled
>>>> 79        0.0133  libc-2.13.so             __strcmp_sse2
>>>> 79        0.0133  libc-2.13.so             _int_malloc
>>>> 78        0.0132  libglib-2.0.so.0.2800.6  g_slice_alloc
>>>> 75        0.0127  libpthread-2.13.so       pthread_getspecific
>>>> 73        0.0123  libwireshark.so.0.0.1    tvb_find_guint8
>>>> 72        0.0122  libc-2.13.so             malloc
>>>> 71        0.0120  libwireshark.so.0.0.1    fast_ensure_contiguous
>>>> 68        0.0115  lt-wireshark             SIPcalls_packet
>>>>
>>>> step 2
>>>> ------
>>>>
>>>> the next improvment was to change append_to_frame_graph() to use a hash
>>>> table for graph analysis (still slow, over 1 hour):
>>>>
>>>> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted
>>>> CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
>>>> mask of 0x00 (No unit mask) count 750000 samples  %        image name
>>>> symbol name
>>>> 565268   94.6185  libglib-2.0.so.0.2800.6  g_list_last
>>>> 1863      0.3118  libwireshark.so.0.0.1    serv_name_lookup 1620
>>>> 0.2712  libwireshark.so.0.0.1    dissect_sip_common 1340      0.2243
>>>> libwireshark.so.0.0.1    format_text
>>>> 987       0.1652  libglib-2.0.so.0.2800.6  g_unichar_isprint
>>>> 982       0.1644  libglib-2.0.so.0.2800.6  g_hash_table_lookup
>>>> 819       0.1371  libwireshark.so.0.0.1    tap_push_tapped_queue
>>>> 731       0.1224  libwireshark.so.0.0.1    guint8_pbrk
>>>> 628       0.1051  libwireshark.so.0.0.1    proto_tree_add_item
>>>> 587       0.0983  libglib-2.0.so.0.2800.6
>>>> g_hash_table_insert_internal
>>>> 574       0.0961  libwireshark.so.0.0.1    compute_offset_length
>>>> 405       0.0678  libwireshark.so.0.0.1    proto_is_protocol_enabled
>>>> 401       0.0671  libwireshark.so.0.0.1    emem_free_all
>>>> 395       0.0661  libwireshark.so.0.0.1    dissect_ip
>>>> 391       0.0654  libc-2.13.so             __strncmp_sse2
>>>> 367       0.0614  libwireshark.so.0.0.1    dissect ...
>>>> 20        0.0017  lt-wireshark
>>>> append_to_frame_graph.isra.8
>>>>
>>>> however, append_to_frame_graph() got from 69.45% to 0.0017% (40852
>>>> times better)
>>>>
>>>> step 3
>>>> ------
>>>>
>>>> I have changed the graph analysis list and the callsinfo list to use
>>>> prepend instead of append:
>>>>
>>>> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted
>>>> CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
>>>> mask of 0x00 (No unit mask) count 750000 samples  %        image name
>>>> symbol name
>>>> 130285   18.6875  libc-2.13.so             msort_with_tmp
>>>> 66464     9.5333  libglib-2.0.so.0.2800.6  node_get_pos 53790
>>>> 7.7154  libglib-2.0.so.0.2800.6  g_hash_table_insert_internal
>>>> 51146     7.3361  libc-2.13.so             __memcpy_sse2
>>>> 44419     6.3712  libglib-2.0.so.0.2800.6  node_get_next
>>>> 42461     6.0904  libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_fixup
>>>> 41265     5.9189  libglib-2.0.so.0.2800.6  is_end
>>>> 40819     5.8549  libglib-2.0.so.0.2800.6  g_hash_table_resize
>>>> 38905     5.5803  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_next
>>>> 36069     5.1736  libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_reorder
>>>> 35828     5.1390  libglib-2.0.so.0.2800.6  g_hash_table_lookup
>>>> 29469     4.2269  libgtk-x11-2.0.so.0.2400.4 save_positions
>>>> 26858     3.8524  libgtk-x11-2.0.so.0.2400.4 generate_order
>>>> 7892      1.1320  libc-2.13.so             __memset_sse2
>>>> 7752      1.1119  libgtk-x11-2.0.so.0.2400.4
>>>> gtk_rbtree_reorder_sort_func 6860      0.9840
>>>> libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_invert_func
>>>> 6835      0.9804  libglib-2.0.so.0.2800.6
>>>> g_hash_table_remove_all_nodes
>>>> 4085      0.5859  libglib-2.0.so.0.2800.6  g_array_append_vals
>>>> 3864      0.5542  libglib-2.0.so.0.2800.6  g_sequence_iter_is_end
>>>> 3591      0.5151  libc-2.13.so             __memset_x86_64
>>>> 2057      0.2950  libglib-2.0.so.0.2800.6  g_array_maybe_expand
>>>>
>>>> step 4
>>>> ------
>>>>
>>>> reordering is performed _continously_ while adding elements to the voip
>>>> calls list.
>>>> (by _gtk_rbtree_reorder which is calling qsort from libc which is
>>>> calling msort_with_tmp)
>>>>
>>>> after disabling continous reordering, by using in
>>>> voip_calls_dlg_update():
>>>>
>>>>        /* disable the re-ordering */
>>>>        gtk_tree_sortable_set_sort_column_id(sortable,
>>>> GTK_TREE_SORTABLE_UNSORTED_SORT_COLUMN_ID, GTK_SORT_ASCENDING);
>>>>
>>>>        listx = g_list_first(listx);
>>>>        while (listx) {
>>>>            add_to_list_store((voip_calls_info_t*)(listx->data));
>>>>            listx = g_list_next(listx);
>>>>        }
>>>>
>>>>        /* enable the re-ordering */
>>>>        gtk_tree_sortable_set_sort_column_id(sortable,
>>>> CALL_COL_START_TIME, GTK_SORT_ASCENDING);
>>>>
>>>> CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted
>>>> CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
>>>> mask of 0x00 (No unit mask) count 750000 samples  %        image name
>>>> symbol name
>>>> 13495     8.8003  libwireshark.so.0.0.1    format_text
>>>> 10698     6.9763  libglib-2.0.so.0.2800.6  g_unichar_isprint
>>>> 7421      4.8394  libwireshark.so.0.0.1    guint8_pbrk
>>>> 4973      3.2430  libwireshark.so.0.0.1    compute_offset_length
>>>> 4776      3.1145  libglib-2.0.so.0.2800.6  g_hash_table_lookup
>>>> 4029      2.6274  libwireshark.so.0.0.1    dissect_sip_common
>>>> 3851      2.5113  libglib-2.0.so.0.2800.6  g_atomic_pointer_get
>>>> 3186      2.0776  libc-2.13.so             __memcpy_sse2
>>>> 3064      1.9981  libglib-2.0.so.0.2800.6
>>>> g_hash_table_insert_internal
>>>> 3008      1.9616  libwireshark.so.0.0.1    emem_free_all
>>>> 2964      1.9329  libwireshark.so.0.0.1    fast_ensure_contiguous
>>>> 2803      1.8279  libwireshark.so.0.0.1    serv_name_lookup
>>>> 2793      1.8214  libwireshark.so.0.0.1
>>>> check_offset_length_no_exception
>>>> 2114      1.3786  libc-2.13.so             _int_malloc
>>>> 2063      1.3453  libglib-2.0.so.0.2800.6  g_slice_free1
>>>> 2046      1.3342  libwireshark.so.0.0.1    tap_push_tapped_queue
>>>> 1841      1.2005  libwireshark.so.0.0.1    tvb_get_guint8
>>>> 1713      1.1171  libc-2.13.so             memchr
>>>> 1668      1.0877  libwireshark.so.0.0.1    tvb_length_remaining
>>>> 1625      1.0597  libglib-2.0.so.0.2800.6  g_atomic_int_get 1550
>>>> 1.0108  libwireshark.so.0.0.1    proto_tree_add_item
>>>> 1524      0.9938  libc-2.13.so             vfprintf
>>>> 1475      0.9619  libwireshark.so.0.0.1    emem_alloc_chunk
>>>> 1439      0.9384  libc-2.13.so             __strncmp_sse2
>>>> 1395      0.9097  libglib-2.0.so.0.2800.6  g_slice_alloc
>>>> 1312      0.8556  libwireshark.so.0.0.1    tvb_find_guint8
>>>> 1279      0.8341  libwireshark.so.0.0.1
>>>> ensure_contiguous_no_exception
>>>> 1052      0.6860  libc-2.13.so             malloc
>>>> 1028      0.6704  libwireshark.so.0.0.1    dissect_sdp
>>>> 1022      0.6665  libwireshark.so.0.0.1    tvb_pbrk_guint8
>>>>
>>>> the runtime decreased now to approx. 1 minute (pretty reasonable when
>>>> compared to 10 hours)
>>>>
>>>> I can publish the patches on bugzilla; however, the changes/tests have
>>>> been explicitly targeted at sip voip calls.
>>>> some of the data structures are global and used by _all_ voip protocol
>>>> taps; it won't work correctly with dumps having also non-sip voip calls.
>>>>
>>>> pls. let me know your opinion.
>>>>
>>>> thanks.
>>>> cristian
>>>> ______________________________________________________________________
>>>> _____ Sent via:    Wireshark-dev mailing list
>>>> <wireshark-dev@xxxxxxxxxxxxx>
>>>> Archives:    http://www.wireshark.org/lists/wireshark-dev
>>>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>>>>
>>>> mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>>>> ______________________________________________________________________
>>>> _____ Sent via:    Wireshark-dev mailing list
>>>> <wireshark-dev@xxxxxxxxxxxxx>
>>>> Archives:    http://www.wireshark.org/lists/wireshark-dev
>>>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>>>>
>>>> mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>>>
>>>
>>> ___________________________________________________________________________
>>> Sent via:    Wireshark-dev mailing list<wireshark-dev@xxxxxxxxxxxxx>
>>> Archives:    http://www.wireshark.org/lists/wireshark-dev
>>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>>>
>>> mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>>>
>>> ___________________________________________________________________________
>>> Sent via:    Wireshark-dev mailing list<wireshark-dev@xxxxxxxxxxxxx>
>>> Archives:    http://www.wireshark.org/lists/wireshark-dev
>>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>>>
>>> mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>>
>>
>> ___________________________________________________________________________
>> Sent via:    Wireshark-dev mailing list<wireshark-dev@xxxxxxxxxxxxx>
>> Archives:    http://www.wireshark.org/lists/wireshark-dev
>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>>
>>  mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>>
>
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
> Archives:    http://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>            mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe