Wireshark-dev: Re: [Wireshark-dev] RTP player - a suggestion
From: Jirka Novak <j.novak@xxxxxxxxxxxx>
Date: Mon, 27 Mar 2017 22:09:41 +0200
Hi Peter,

>>> My proposal:
>>> Add 'mixer' layer which will provide the following features to
>>> improve usability:
> 
> Your idea sounds good overall, but before you dive into it, can I
> first suggest that a lot of the current problems are due to the use of
> QCustomPlot for graphing the audio?  It draws *very* slowly, and
> that's true even after we downsample the audio just so the plot has
> less data points to draw.
> 
> If I had the free time and the Qt knowledge, I'd love to see this use
> of QCustomPlot replaced with a widget that behaves more like the old
> GTK RTP player's display did, or better yet, more like an Audacity
> track (with proper support for zooming and displaying the waveform
> differently depending on the zoom level).  Since I don't think such a
> widget exists yet, that would mean first making a new UI widget and
> then rewriting Wireshark's Qt UI to use it. :-/
> 
> If that sounds like too much work, a better first step might be to
> change the current Qt RTP player so that it creates one QCustomPlot
> for each stream (like the GTK player did), rather than displaying
> multiple streams overlapping.  It probably won't help or hurt
> performance much, but changing the layout of the UI may give a clearer
> picture of how a user would like to select or mix streams together, so
> maybe that will affect how you think about the mixer layer.

I'm not sure whether QCustomPlot is the bad guy, but it might be. My
main aim was to "gather" same code from multiple places and write one
"library" to process RTP - to process RTP statistics and to process RTP
audio. The library would process/extract each RTP stream separately.
Mixing/downsizing can be build on top of it.

Right now I have no capability and knowledge of Qt to write a new Qt player.

>>> I'm also guessing using just one audio output gives better
>>> performance with lots of parallel streams.
> 
> To be sure, I *desperately* want to see the audio go back to the
> left-right panning that the GTK player did, so that when listening to
> multiple streams, you can hear which one is which because they come
> from different positions.
> 
> Come to think of it, that may obviate the need for mixing in the first
> place; if you have 3 channels, just tell the audio system you want to
> play audio that has 3 channels (left, center, right) and provide the
> data appropriately interleaved.  No mixing.
> 
> But I'm not sure how that would scale to more than 3 channels.  Then
> again, how many people are listening to more than 3 streams
> simultaneously anyway?  I can't think of many use cases for that.
> Maybe a multicast stream with a bunch of participants, who may be
> taking turns talking.  Maybe call transfers (but as you say, it'd be
> nicer to handle those so that one stream "replaces" the other).
> Nevertheless, I suppose we do have to handle that case somehow.  So I
> guess we can't really avoid mixing.  But if there is *any* way to get
> the audio system to do it for us, wouldn't that be preferable?

I tried to handle similar idea with current Qt RTP player:
I added an option for each stream in table below the player to allow
user to select how audio from a channel should be processed -
Left/Right/Silence.
Then QCustomPlot shown RTP streams based on this option and output was
mixed to Left/Right loudspeaker based on it too. I stacked streams one
by one - common X axis, Y axis per stream.
I added possibility to add "bar" to graph (bar over all shown streams)
to define where to start playing.
I think it was much better than current state of player.
I can imagine that code can detect e.g. how many audio channels the
system have and offer mono/silence or left/center/right/silence or more.

On the other hand, changes/research mentioned above led me to idea to
write common code to process RTP :-)

>> I started work on "proof of concept" for very similar idea. I didn't
>> finished it yet. But I have a few points which I would like to
>> mention and discuss it there:
>>
>> 3) When I analysed GTK and Qt code, I found that there is main
>> difference between it. GTK stores all RTP data in memory, Qt extract
>> it to file. Playing is based on extracted data then.
> 
> Not only does Qt extract it to a temporary file, it never deletes
> those files.  That's rude.

I think it could be fixed - BTW I didn't seen it with development
version (except my own faults).
BTW I see another area to improve - the Qt RTP Player/Analysis dialogs
are modal, therefore you can't "jump" between them.

> I suspect the only reason it does this in the first place is because
> that's just the API to the Speex resampler that someone chose to use
> to downsample the audio.  And the only reason it downsamples in the
> first place is because the QCustomPlot it uses to display the audio is
> too slow.
> 
> Hence why I led by talking about the use of QCustomPlot.  I feel that
> it's the root cause of a bunch of problems.  Instead of dealing with
> these various symptoms, it would be better to address the root cause,
> and these other questions (like how to downsample and store audio)
> would go away.
> 
> There shouldn't be any need to downsample the audio at all.  If it is
> downsampled, it should only be for *display*, so ideally it would be
> the widget's job to downsample based on the zoom level.  No code
> external to the widget ought to be downsampling anything.

As I mentioned above, I can work with QCustomPlot and I'm not able to
write different player.

Just note:
RTP processing library must be able to mix RTP audio to output and allow
to downsample streams one by one - both that tasks independently.

> There also shouldn't be any need for temporary files; audio is small
> enough that it can fit in memory.

BTW it is good question to others too - should we really expect that
audio will fit to available (even virtual) memory?
I'm working in area where SIP call and related RTP streams lasts for
weeks. Most of it is an silence, but I need to analyse it "as it is"
before I find "interesting" place in the call.
When you imagine that you need to process multiple RTP streams with week
duration and then create additional mixed stream for playing and then
something more for downsampling...
I have a lot of memory, but I don't thing that everyone has it.

						Sincerely yours,

							Jirka Novak