Vanilagy/webm-muxer

Help getting audio from audio context working

Closed this issue · 8 comments

I am wondering if anyone can help me mux video and audio (not from the microphone) together? Below is a snippet of some of the code I am using inside a cables.gl op file. I have managed to feed canvas frames one by one to the video to get perfectly formed videos with no missing frames. However when I add the audio the video is not viewable, when I ffmpeg convert it to mp4 there is no audio.

              const audioCtx = CABLES.WEBAUDIO.createAudioContext(op);
            const streamAudio = audioCtx.createMediaStreamDestination();

            inAudio.get().connect(streamAudio); <-- this gets fed from an audio source in cables

      audioTrack = streamAudio.stream;
      recorder = new MediaRecorder(audioTrack);

    	muxer = new WebMMuxer({
        "target": "buffer",
        "video": {
            "codec": "V_VP9",
            "width": inWidth.get() / CABLES.patch.cgl.pixelDensity,
            "height": inHeight.get() / CABLES.patch.cgl.pixelDensity,
            "frameRate": fps
        },
        "audio": {
            "codec": "A_OPUS",
            "sampleRate": 48000,
            "numberOfChannels": 2
        },
        "firstTimestampBehavior": "offset" // Because we're directly pumping a MediaStreamTrack's data into it
    });

    videoEncoder = new VideoEncoder({
        "output": (chunk, meta) => { return muxer.addVideoChunk(chunk, meta); },
        "error": (e) => { return op.error(e); }
    });
    videoEncoder.configure({
        "codec": "vp09.00.10.08",
        "width": inWidth.get() / CABLES.patch.cgl.pixelDensity,
        "height": inHeight.get() / CABLES.patch.cgl.pixelDensity,
        "framerate": 29.7,
        "bitrate": 5e6
    });

    	if (audioTrack) {
    	    op.log('we HAVE AUDIO !!!!!!!!!!!!!!!!!!')

/* I REMOVED ALLL THE CODE FROM THE DEMO FROM HERE

// 		const audioEncoder = new AudioEncoder({
// 			output: (chunk) => muxer.addRawAudioChunk(chunk),
// 			error: e => console.error(e)
// 		});
// 		audioEncoder.configure({
// 			codec: 'opus',
// 			numberOfChannels: 2,
// 			sampleRate: 48000, //todo should have a variable
// 			bitrate: 128000,
// 		});


		// Create a MediaStreamTrackProcessor to get AudioData chunks from the audio track
// 		let trackProcessor = new MediaStreamTrackProcessor({ track: audioTrack });
// 		let consumer = new WritableStream({
// 			write(audioData) {
// 				if (!recording) return;
// 				audioEncoder.encode(audioData);
// 				audioData.close();
// 			}
// 		});
// 		trackProcessor.readable.pipeTo(consumer);

TO HERE */

      recorder.ondataavailable = function(e){
          op.log('test', e.data) <-- this returns a blob {size: 188409, type: 'audio/webm;codecs=opus'}
          //audioEncoder.encode(e.data);
          muxer.addAudioChunkRaw(e.data) <-- this throws no errors
      }
      recorder.start()
}

Hey! What you are doing here cannot work; you are first encoding the audio channel into a WebM file (that's what MediaRecorder does), and then piping that into the WebM as an audio chunk. The WebM muxer expects the raw codec data, e.g. an Opus frame, for example. Additionally, the audio is split into many small frames, whereas in your example, you're giving it the whole 188 kB chunk at once.

I take it from your example that you want to sample the audio coming out of your audio context and add that to the WebM. What you want is a way to get an audio buffer that contains the output of your audio context. A nice way to do this is using the OfflineAudioContext, but this is probably not applicable in your case.

What you can do instead is pipe your output (using inAudio.get().connect) into a ScriptProcessorNode or, using the newer API, into an AudioWorklet (I'd use the ScriptProcessor). These get called periodically with new audio data, provided to you as raw bytes representing the individual audio samples for each channel. You can aggregate these into an array, and when you're done recording, create an instance of AudioData from the array. Then, pass this AudioData instance into the encode method for an AudioEncoder. The AudioEncoder will spit out many small chunks, just like the VideoEncoder, which you can then pipe into the muxer.

I hope this helps! This might require some trial and error and doc-reading to get to work right, but I'm pretty sure this is the correct way.

It sounds complex, but it's not when you've written it once! If it improves UX, I'd definitely try to mux it into one thing. All you need to do is push to an array, then create AudioData and encode that. I know it sounds intimidating, but it's one of these things that end up being like 25 lines of code in the end and seem easy in hindsight:)

But hey, do whatever you think works best. Still appreciate you using my lib!

I have given up :( I spent the best part of 2 days but this stuff is too complex for me... I got this far

 const audioCtx = CABLES.WEBAUDIO.createAudioContext(op);

    bypass = `class Bypass extends AudioWorkletProcessor {
             constructor() {
                super();
              }

            process(inputs, outputs, parameters) {

     const inputData = inputs[0][0];

      // Get the audio data as a Float32Array
      const audioDataArray = new Float32Array(inputData);

      // Send the audio data to the parent thread
      this.port.onmessage = (e) => {
      console.log('audio data', audioDataArray.length, audioDataArray)
      this.port.postMessage(audioDataArray);
      }

      // Clear the audio data buffer
      this.audioData.length = 0;


    return true;
  }

}
registerProcessor('bypass', Bypass);
                `;


    const innerAudio = inAudio.get();
    if (!innerAudio)
    {
        console.error("Failed to get audio source from inAudio");
        return;
    }

    let blob = new Blob([bypass], { "type": "application/javascript" });

    let reader = new FileReader();
    reader.readAsDataURL(blob);
    let dataURI = await new Promise((res) =>
    {
        reader.onloadend = function ()
        {
            res(reader.result);
        };
    });


    // Create an AudioEncoder
    		audioEncoder = new AudioEncoder({
    			output: (chunk) => muxer.addAudioChunk(chunk),
    			error: e => console.error(e)
    		});
    		audioEncoder.configure({
    			codec: 'opus',
    			numberOfChannels: 1,
    			sampleRate: 48000, //todo should have a variable
    			bitrate: 128000,
    		});


    await audioContext.audioWorklet.addModule(dataURI)
   .then(() =>
        {
            op.log("here 1");
            // Create an instance of the AudioWorkletNode
            const audioEncoderNode = new AudioWorkletNode(audioCtx, "bypass");

            // Connect the audio source to the audio encoder node
            innerAudio.connect(audioEncoderNode);

            finalizeAudio = function ()
            {
                console.log('finalise internal called')
                return new Promise(function(resolve, reject) {
                audioEncoderNode.port.onmessage = function(e){
                       audioEncoder.encode(e.data)
                    resolve('success')
                }
                // Send a message to the AudioEncoderWorklet to stop encoding and return the encoded audio data
                audioEncoderNode.port.postMessage({ "type": "finalizeAudio" });
                })
            };
        })
        .catch((err) =>
        {
            console.error("Error registering AudioEncoderWorklet:", err);
        });

The AudioWorkletProcessor is posting back the wrong format and the audioEncoder always complains that its not AudioData format. I have tried a million things with the 'help' of gpt-4 but no luck. And I suspect I am doing it wrong calling it right at the end too, perhaps it needs to be on a rolling basis like the video frames?

Any chance I could pay you to make it work ?

Okay hold on, I realize that your initial attempt should actually work if we simply use the audio stream differently:

const audioCtx = CABLES.WEBAUDIO.createAudioContext(op);
const streamAudio = audioCtx.createMediaStreamDestination();

inAudio.get().connect(streamAudio);

audioTrack = streamAudio.stream.getAudioTracks()[0];

muxer = new WebMMuxer({
	"target": "buffer",
	"video": {
		"codec": "V_VP9",
		"width": inWidth.get() / CABLES.patch.cgl.pixelDensity,
		"height": inHeight.get() / CABLES.patch.cgl.pixelDensity,
		"frameRate": fps
	},
	"audio": {
		"codec": "A_OPUS",
		"sampleRate": 48000,
		"numberOfChannels": 2
	},
	"firstTimestampBehavior": "offset" // Because we're directly pumping a MediaStreamTrack's data into it
});

videoEncoder = new VideoEncoder({
	"output": (chunk, meta) => { return muxer.addVideoChunk(chunk, meta); },
	"error": (e) => { return op.error(e); }
});
videoEncoder.configure({
	"codec": "vp09.00.10.08",
	"width": inWidth.get() / CABLES.patch.cgl.pixelDensity,
	"height": inHeight.get() / CABLES.patch.cgl.pixelDensity,
	"framerate": 29.7,
	"bitrate": 5e6
});

if (audioTrack) {
	op.log('we HAVE AUDIO !!!!!!!!!!!!!!!!!!');

	audioEncoder = new AudioEncoder({
		output: (chunk, meta) => muxer.addAudioChunk(chunk, meta),
		error: e => console.error(e)
	});
	audioEncoder.configure({
		codec: 'opus',
		numberOfChannels: 2,
		sampleRate: 48000, //todo should have a variable
		bitrate: 128000,
	});

	// Create a MediaStreamTrackProcessor to get AudioData chunks from the audio track
	let trackProcessor = new MediaStreamTrackProcessor({ track: audioTrack });
	let consumer = new WritableStream({
		write(audioData) {
			audioEncoder.encode(audioData);
			audioData.close();
		}
	});
	trackProcessor.readable.pipeTo(consumer);
}

I maintained most of the code from my demo, using a MediaStreamTrackProcessor to get the AudioData from the stream, which you created. This should do the job! Not sure why I didn't think of this sooner. I guess my question would be, why did you comment out this original code from the demo - did you think it's not applicable in your case?

That said, I didn't test or run this code, but this should give you a good baseline. To recap: We get the audio track from the MediaStreamDestinationNode, then pipe that into a MediaStreamTrackProcessor which gives us, bit by bit, instances of AudioData as the audio is playing. This AudioData is then being fed into the AudioEncoder, which spits out EncodedAudioChunks, which are then sent to the muxer.

Get back to me and tell me if this worked for you (maybe you'll need to adjust it a bit, still). If all of this still doesn't work, we can discuss me pouring in a bit more of my time to help you!

Thanks I got it working finally. I had to play around to make sure it was not recording passed the length of the video as I am recording the video frame by frame (slower than realtime) so the sound was carrying on making the video longer and paused at the end. Thanks for this great muxer!

Let's go!