audetto/AppleWin

SDL Audio issues

Closed this issue · 15 comments

Originally from #22 (comment)

The Mockingboard code is clearly more complex as it's processing multi-channel sound, so probably easier to focus first on single channel Speaker. Is your own code modifying the AppleWin audio code (if so, where) or is the problem that the AppleWin code is introducing the audio latency (perhaps because it isn't reading the audio buffer size correctly)? It looks to me like the Speaker code is trying to balance buffer underruns and also not preload so much audio as to introduce unnecessary latency elsewhere. I can take a stab at this, but want to make sure I have a full understanding of the scope.

AppleWin audio code is run unchanged (like 99% of the rest).

The Windows IDirectSoundBuffer is implemented here https://github.com/audetto/AppleWin/blob/master/source/linux/windows/dsound.h#L76

And data is moved to SDL here: https://github.com/audetto/AppleWin/blob/master/source/frontends/sdl/sdirectsound.cpp#L166

Remember we run 1 frame at a time, rather than 1ms at a time like in Windows.

Fell free to experiment.
Make sure the lag you see does not already exist in the Windows version.

In your DirectSoundGenerator::writeAudio() method, you have a conditional statement prefixed by if (bytesFree > 0). As I understand it, doesn't this mean that the audio gets copied to the SDL audio buffer virtually every time the method is called (unless no audio has played since the last tick)? IOW, on 99.99% of calls, the conditional is true.

My understanding here is that AppleWin is making an effort to keep the audio queue reasonably full without spending too many cycles keeping it 100% full. It seems reasonable to take a similar approach with the SDL audio buffer. Some ideas:

// Instead of:
if (bytesFree > 0)

// How about (keeps SDL audio buffer >=50% full):
if (bytesFree > myBuffer->bufferSize >> 1)

// Or:
if (SDL_GetQueuedAudioSize(myAudioDevice) < myBuffer->bufferSize >> 1)

In testing, the 50% threshold works well and allows a cheap bit shift. Maybe over-optimizing just a bit, but you could include the condition sooner and return earlier (on first code line of method call):

if (!isRunning() || SDL_GetQueuedAudioSize(myAudioDevice) > myBuffer->bufferSize >> 1) { return; }
In theory this should reduce the SDL interface latency a small amount (desired), but I don't see how it would impact the audio AppleWin is queuing up for playback. However, it's a good point to compare against native Windows performance (next step) and would really be ideal to compare against an actual Apple II.

Thank you.

The danger is to leave too little in the SDL queue that we get under runs.
Maybe the condition should be based on time-equivalent.

You know that on average the next call to writeAudio will be in ~17ms, so we must at least provide SDL this, + a little to avoid glitches.

I will print some stats to see if this makes any sense.
And probably worth exposing this as a parameter.

I've tried your .5 and I hear many glitches.
I used Karateka for most of my tests.

What do you use to play audio?
And what does this look for you?

Screenshot from 2021-07-10 09-21-36

Numbers are in ms.

I am not too sure

  • If I make SDL buffer smaller, than AW gets bigger and get really nasty audio effects (like echo)
  • If I make SDL buffer bigger, than AW stabilises at about 0.016 and music plays well (with latency)

Try this

https://github.com/audetto/AppleWin/tree/audio

Screenshot from 2021-07-10 12-21-34

  • trigger is the level (in ms) below which the audio is copied to SDL
  • target is the target queue size (in ms) after the copy

So you can say, only copy if queue is < 10 ms, and if you copy, make the new queue 15 ms.
For me good quality audio only happens with big values.

Made a couple of fixes to https://github.com/audetto/AppleWin/tree/audio but no improvements.

One of the issues with AW code is that they do not use the circular buffer in a straightforward way (https://docs.microsoft.com/en-us/previous-versions/windows/embedded/ms897826(v=msdn.10)) i.e. by using DSBLOCK_FROMWRITECURSOR, but they do some extra book-keeping on the side.
I think the circular buffer overrun if the SDL buffer is too small. You can see it in the settings window.

I asked about it, but did not get very far

AppleWin#538 (comment)

I've been using code from your main branch and instead of Volume/Direct/SDL my Audio columns read Volume/Buffer/Queue (I don't see how to paste a screenshot here). At >=50% setting, buffer on Channel 1 is running 0.025 and Queue is 0.242 for Karateka (Color NTCS Monitor windowed).

I'll give the code in the audio branch a test later tonight. In theory you could sample in real-time to identify what percentage the SDL buffer needs to stay full to avoid underruns, but I suspect that will offset any gains to be had versus just re-filling the buffer 100% on each call by default.

I'm testing w/ Mario Bros. Almost constant audio and pressing Esc immediately pauses the audio. Adjusting audio queue mid-stream (with the slider) doesn't seem to work so well, but if I start with a default queue of 100ms (hard-coded), audio performance seems okay. Anything less causes problems.

It seems that once audio queue falls a certain amount behind, it completely drops out (versus resetting with a new sample). Is that your assessment and the behavior you're trying to address?

Yes, if the direct (i.e. AppleWin) buffer gets full, then it behaves strangely.

  • audio drops out
  • echo

If you bring it down, sometimes it takes a log time to go back to normal.
You can pause and restart then emulator with Pause key to reset it.

I think something odd happens in the AW logic to handle these situations, which in Windows just never happen.

Okay, I think I have a pretty good idea of the problem. In sdirectsound.cpp within the DirectSoundGenerator::writeAudio() method, you have a condition if (queued < bufferSize). It's clear that's there to ensure your bytesToCopy value isn't negative. However, if the audio starts skipping, causing the queue to grow larger than the bufferSize, that condition is skipped and you never call mixBuffer. Since there's nothing else to cover the (queued < bufferSize), audio drops once your queue grows too large. This explains the observed behavior.

I think the solution in pseudo-code is something like:
else /*if (queued >= bufferSize) */ {
myBuffer->Read() // read bufferSize bytes from the END of the AW audio buffer
mixBuffer() // as appropriate
}

When audio skipping becomes a problem, the idea is to just grab the last DirectSoundGenerator::ourBufferSize of audio from the AW audio queue. Sure, it'll be choppy, but it should remain in sync. To make it really slick, if the bufferSize is a LOT bigger than what's queued, reduce the SDL queue by a set amount in the first condition; if it's smaller, increase the SDL buffer size by a set amount in the second condition. In this way, the audio delay would dynamically adjust to reflect available CPU cycles (even as they vary with user workload).

One final question... why this: const int bytesToCopy = ((bufferSize - queued) / myBytesPerUnit) * myBytesPerUnit;
and not this: const int bytesToCopy = bufferSize - queued;

I'm guessing you have a better intuition for what variables to use to calculate the audio pointer location and copy end of buffer, but let me know if you want me to take a stab at turning the pseudo-code to real C++.

Okay, I think I have a pretty good idea of the problem. In sdirectsound.cpp within the DirectSoundGenerator::writeAudio() method, you have a condition if (queued < bufferSize). It's clear that's there to ensure your bytesToCopy value isn't negative. However, if the audio starts skipping, causing the queue to grow larger than the bufferSize, that condition is skipped and you never call mixBuffer. Since there's nothing else to cover the (queued < bufferSize), audio drops once your queue grows too large. This explains the observed behavior.

I think the solution in pseudo-code is something like:
else /*if (queued >= bufferSize) */ {
myBuffer->Read() // read bufferSize bytes from the END of the AW audio buffer
mixBuffer() // as appropriate
}

So you are saying:

if queue < buffer: read enough to make the queue at least buffer.
otherwise, read buffer size.

maybe it works, but it looks very "discontinuous":

imagine :
queue = 99, buffer = 100: I copy 100 - 99 = 1, so that queue = 100
queue = 101, buffer = 100, I copy 100, now queue = 201

I think the transition should be "continuous".

When audio skipping becomes a problem, the idea is to just grab the last DirectSoundGenerator::ourBufferSize of audio from the AW audio queue. Sure, it'll be choppy, but it should remain in sync. To make it really slick, if the bufferSize is a LOT bigger than what's queued, reduce the SDL queue by a set amount in the first condition; if it's smaller, increase the SDL buffer size by a set amount in the second condition. In this way, the audio delay would dynamically adjust to reflect available CPU cycles (even as they vary with user workload).

All you are saying is correct, but you need to take into account the interaction with AW adaptive code: https://github.com/audetto/AppleWin/blob/master/source/Speaker.cpp#L750 sometimes it works, sometimes not, and this I do not understand.
It should self-adapt to variable buffer sizes and revert to a direct buffer of about 1/2 or 1/4 full. It will take a few seconds, but I see that it does not happen always.

One final question... why this: const int bytesToCopy = ((bufferSize - queued) / myBytesPerUnit) * myBytesPerUnit;
and not this: const int bytesToCopy = bufferSize - queued;

because I did not want to copy part of a frame: in Mockingboard 2 channels, 16 bit, each frame is 4 bytes. so i want to make sure I only copy whole frames rather than risking to loose the sync.

I'm guessing you have a better intuition for what variables to use to calculate the audio pointer location and copy end of buffer, but let me know if you want me to take a stab at turning the pseudo-code to real C++.

if you find something that works well, please make a branch so I can analyse it.

audio...webspacecreations:patch-3

I tested a bunch of different approaches, but setting a small initial value and dynamically adjusting the buffer size if the queue grows too large seems to work well.

I think Mario Bros is a good program for testing audio latency. I've now checked it running in AppleWin on Windows. Admittedly the Windows machine is MUCH more powerful than the Pi, but the test shows audio should immediately stop/start upon pressing Esc mid-game (pause). Even with the small buffer size, I'm still seeing audio delay on Pi, which points to some other culprit. Hopefully this code helps narrow down the possibilities.

Glad to that this is being worked on because the audio latency is the one thing that holds me back from using this fantastic emulator.

@xandark I noticed your comment. I haven't had a chance to play with this for some time and @audetto is doing an incredible amount of work just to make the Windows code run on Pi / Linux. Not sure whether any further progress has been made on the audio front... is this something you might be able to help with? Overall, the emulator seems to be in a good place and sync'ing with AppleWin really is the most sustainable long-term Apple II emulation solution.

With some rudimentary Mac (#39) functionality, it really seems like it's time to bring the larger platform support into the official codebase to get more eyes on the project, but that's just me hoping.

@webspacecreations thanks for checking in. Sorry, I do not have the time to help out. At this point in my life, I have to settle to be a user. Though it is great to hear about all the progress!