Logic for writing OGG file to `MemoryStream` maybe flawed?

Question

Logic for writing OGG file to `MemoryStream` maybe flawed?

Closed this issue 4 months ago · 6 comments

The implementation of Finish() in OpusOggWriteStream (Concentus.OggFile 1.0.5) is:

public void Finish()
{
  int count = this._opusFrame.Length - this._opusFrameIndex;
  this.WriteSamples(new short[count], 0, count);
  this.FinalizePage();
  this.WriteStreamFinishedPage();
  this._outputStream.Flush();
  this._outputStream.Dispose();
  this._finalized = true;
}

I can't use a MemoryStream after this method is called on the wrapper. One would need access to the internal functions FinalizePage and WriteStreamFinishedPage (which calls FinalizePage) without the call to Dispose on the internal stream.

The workaround for now is to manually do what is implemented in those methods and manually handle the stream's disposal.

Another peeve is that there doesn't seem to be a way to pre-allocate a block of bytes that could be used by an underlying stream or directly from within the Writer wrapper.

Answer 1 · 2024-05-07T22:04:19.000Z

Yeah, there are a lot of known problems with this code base that I have not bothered to really address, this being one of them. Worse than that, I also maintain two separate forks of this code and spend more time on the other fork so their features have diverged in messy ways.

For the Finish() issue, you could make a PR to add an ownsStream constructor parameter to OggOpusWriteStream, and then conditionally close the output stream based on that flag, as is done in other places in .Net.

Answer 2 · 2024-05-07T22:09:51.000Z

There is a grim and forbidden way to accomplish what I think you're trying to do with regards to streaming and temp buffers...
Durandal.AI
Durandal.AI.NonPortable
Durandal.AI.Extensions.NativeAudio

public static async Task DecodeOpusFromStream()
{
    ILogger logger = new ConsoleLogger("OggOpusTest", LogLevel.All);
    NativePlatformUtils.SetGlobalResolver(new NativeLibraryResolverImpl());
    // this isn't quite working at the moment in .net framework, not sure why
    AssemblyReflector.ApplyAccelerators(typeof(NativeOpusAccelerator).Assembly, logger);

    OggOpusCodecFactory codecFactory = new OggOpusCodecFactory();
    using (IAudioGraph graph = new AudioGraph(AudioGraphCapabilities.Concurrent))
    using (FileStream fileInStream = new FileStream(@"C:\Data\Sample.opus", FileMode.Open, FileAccess.Read))
    using (AudioDecoder decoder = codecFactory.CreateDecoder("oggopus", string.Empty, graph, logger.Clone("OpusDecoder"), "OpusDecoder"))
    {
        // Read ogg opus header and parse the output format
        await decoder.Initialize(fileInStream, false, CancellationToken.None, DefaultRealTimeProvider.Singleton).ConfigureAwait(false);
        AudioSampleFormat decodedFormat = decoder.OutputFormat;

        const int samplesPerChannelToDecode = 1024;
        float[] decodedAudioFloat32 = new float[samplesPerChannelToDecode * decodedFormat.NumChannels];
        short[] decodedAudioInt16 = new short[samplesPerChannelToDecode * decodedFormat.NumChannels];

        while (!decoder.PlaybackFinished)
        {
            int samplesPerChannelActuallyRead = await decoder.ReadAsync(
                decodedAudioFloat32, 0, samplesPerChannelToDecode,
                CancellationToken.None, DefaultRealTimeProvider.Singleton).ConfigureAwait(false);

            if (samplesPerChannelActuallyRead > 0)
            {
                // We can convert float32 to int16 if desired
                AudioMath.ConvertSamples_FloatToInt16(
                    decodedAudioFloat32, 0,
                    decodedAudioInt16, 0,
                    samplesPerChannelActuallyRead * decodedFormat.NumChannels,
                    clamp: false);

                // Do something with the audio....
                Console.WriteLine($"Got {samplesPerChannelActuallyRead} samples");
            }
        }
    }
}

public static async Task EncodeOpusToStream()
{
    ILogger logger = new ConsoleLogger("OggOpusTest", LogLevel.All);
    NativePlatformUtils.SetGlobalResolver(new NativeLibraryResolverImpl());
    AssemblyReflector.ApplyAccelerators(typeof(NativeOpusAccelerator).Assembly, logger);

    OggOpusCodecFactory codecFactory = new OggOpusCodecFactory();
    using (IAudioGraph graph = new AudioGraph(AudioGraphCapabilities.Concurrent))
    using (MemoryStream fileOutStream = new MemoryStream())
    {
        using (AudioEncoder encoder = codecFactory.CreateEncoder("oggopus", graph, AudioSampleFormat.Stereo(48000), logger.Clone("OpusEncoder"), "OpusEncoder"))
        {
            // Write ogg header
            await encoder.Initialize(fileOutStream, false, CancellationToken.None, DefaultRealTimeProvider.Singleton).ConfigureAwait(false);

            const int samplesPerChannelToEncode = 1024;

            for (int c = 0; c < 10; c++)
            {
                // get audio from somewhere...
                float[] someAudioToEncode = new float[1024 * encoder.InputFormat.NumChannels];

                await encoder.WriteAsync(
                    someAudioToEncode, 0, samplesPerChannelToEncode,
                    CancellationToken.None, DefaultRealTimeProvider.Singleton).ConfigureAwait(false);
            }

            await encoder.Finish(CancellationToken.None, DefaultRealTimeProvider.Singleton).ConfigureAwait(false);
        }

        byte[] encodedOpus = fileOutStream.ToArray();
    }
}

Answer 3 · 2024-05-08T01:50:34.000Z

actually I think I misunderstood your intent.... why not just implement your own Stream that the ogg writer connects to? That stream could have its own preallocated buffer like you describe. All that the OggOpusWriteStream does with that stream is write Ogg pages to it, which are max 64Kb each (plus maybe a few bytes for header? check the spec). Your implementation of Write could just copy this data to your buffer or pipe it to wherever else you want it to in your program. All the Finish() method does is flush the final page and end of stream to the output as otherwise normal Ogg pages so you won't even need to do any special handling for that. Just override Stream.Write and you should be good.

Answer 4 · 2024-05-08T08:29:54.000Z

actually I think I misunderstood your intent.... why not just implement your own Stream that the ogg writer connects to? That stream could have its own preallocated buffer like you describe. All that the OggOpusWriteStream does with that stream is write Ogg pages to it, which are max 64Kb each (plus maybe a few bytes for header? check the spec). Your implementation of Write could just copy this data to your buffer or pipe it to wherever else you want it to in your program. All the Finish() method does is flush the final page and end of stream to the output as otherwise normal Ogg pages so you won't even need to do any special handling for that. Just override Stream.Write and you should be good.

I think MemoryStream already supports this based on the docs: (https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream.-ctor?view=netstandard-2.1 | https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream.-ctor?view=netframework-4.8.1)

No issues in that case.

Answer 5 · 2024-05-08T10:09:27.000Z

Okay I do notice that I can only write int16 values to the OGG file, is this standard behaviour?

Answer 6 · 2024-05-08T13:53:31.000Z

Okay I do notice that I can only write int16 values to the OGG file, is this standard behaviour?

The encoder technically supports float too but internally only uses int16 so this code path requires the fewest format conversions. OggOpusWriteStream has to buffer enough audio data to fill a single packet and so this presents a minor issue of what format to store that buffer in. If it stored in float32 then the internal encoder would always be doing extra conversions (and possibly redundant conversion if the caller originally converted from int16 -> float32 and then the encoder converts back float -> int anyways).

It wouldn't be unreasonable to have a Write(ReadOnlySpan<float>) implementation on OggOpusWriteStream which just converts the samples to int to append to its internal buffer. It would mostly just be convenience though to save the caller the hassle of converting it themselves; there would be no speed or audio quality benefit.