guest271314/captureSystemAudio

Live streaming audio output

Closed this issue · 17 comments

zzph commented

Hi there,

I came across your library when looking for a way to stream audio from my PC via a browser.

Despite efforts with WebRTC, and modifying the codec/quality- it still sounds terrible.

I was thinking of using a library such as yours, then streaming it using opus-stream-decoder, via https://github.com/AnthumChris/fetch-stream-audio.

Is that possible? Or can you suggest another way of achieving this?

Thanks

Despite efforts with WebRTC, and modifying the codec/quality- it still sounds terrible.

Yes, I concur. Using navigator.mediaDevices.getDisplayMedia({video: true, audio: true}) (then removing the video track) produces sub-par audio output.

I was thinking of using a library such as yours, then streaming it using opus-stream-decoder, via https://github.com/AnthumChris/fetch-stream-audio.

Is that possible? Or can you suggest another way of achieving this?

Yes, that is possible.

Currently the PCM is captured and played in the browser using MediaStreamTrackProcessor to get the correct cadence and timestamp for input to MediaStreamTrackGenerator.

Output could be piped thrrough opusenc. Interestingly I have been experimenting with comparing opusenc and opusdec to WebCodecs AudioEncoder and AudioDecoder over the past several days https://bugs.chromium.org/p/chromium/issues/detail?id=1254496#c32, and I have reached a similar conclusion relevant to quality of WebCodecs being sub-par compared to opusdec, while my main purpose was demonstrating that opusdec decodes to origin input sample rate, where AudioDecoder does not, instead outputs PCM with 48000 sample rate, which Opus uses internally.

If I understand your use case correctly, are you trying to stream the system audio output, or specific playback output, to a different browser, or server?

If the use case is streming to a file the code does that using MediaRecorder at

this.recorder.onstop = async (e) => {
this.resolve(
new Response(this.outputStream).blob()
);
};

and

var audioStream = new AudioStream(
  `parec -d alsa_output.pci-0000_00_1b.0.analog-stereo.monitor`
);
// audioStream.mediaStream: live MediaStream
audioStream
  .start()
  .then((ab) => {
    // ab: ArrayBuffer representation of WebM file from MediaRecorder
    console.log(
      URL.createObjectURL(
        new Blob([ab], {
          type: 'audio/webm;codecs=opus',
        })
      )
    );
  })
  .catch(console.error);
// stop capturing system audio output
audioStream.stop();
zzph commented

If I understand your use case correctly, are you trying to stream the system audio output, or specific playback output, to a different browser, or server?

Thanks.

Either output.

WebRTC kind-of handles both options (using a loopback for system audio out), and using 'shareScreen' also lets one share a Chrome tab with audio.

Ive figure out how to save the audio blob in good quality, now I need to find a way to "stream" that blob (even if the cost is latency of a few seconds).

Maybe a server in-between isn't even necessary?

Any ideas on how to achieve that?

now I need to find a way to "stream" that blob

Stream to where? Locally or remotely?

zzph commented

Stream to where? Locally or remotely?

Remotely, via the browser.

I’m thinking p2p is out of the question given it could have several “listeners”?

I’ve successfully been able up grab the media stream, convert it to opus using a polyfill

I’m just stuck at:

  1. how do i transmit the “live” file?
  2. when listeners connect, how can it know which part of the stream to start at?
  1. how do i transmit the “live” file?

In AudioStream we supply source AudioData to MediaStreamTrackGenerator which is an instance of MediaStreamTrack.

this.generator = new MediaStreamTrackGenerator({
kind: 'audio',
});

You can pass the track to a WebRTC PeerConnection to stream "peer-to-peer". WebRTC encodes to Opus by default https://plnkr.co/edit/1HsvQh08tYb24810?preview

a=rtpmap:111 opus/48000/2
  1. when listeners connect, how can it know which part of the stream to start at?

Given 1. is a live stream there is only one option, the remote peer must receive only the live stream.

zzph commented

Thanks for that

Couple questions about your example/AudioStream:

  1. Being P2P, will that cause a problem when there are "a lot" of listeners on the one AudioStream? IE delay/lag?

  2. If I modified your example to send the users' media screen audio instead, would I instead send a blob like the following?

       const stream = navigator.mediaDevices.getDisplayMedia( {
                video: true,
                audio: true
         } )
       const blob = await stream.blob();
      const buffer = await ac.decodeAudioData(await blob.arrayBuffer());
        absn.buffer = buffer;
        capture.src = URL.createObjectURL(blob);
      })();
  1. Being P2P, will that cause a problem when there are "a lot" of listeners on the one AudioStream? IE delay/lag?

"a lot" is not specific. You can test, see https://webrtc.github.io/samples/src/content/peerconnection/multiple/.

  1. If I modified your example to send the users' media screen audio instead, would I instead send a blob like the following?
   const stream = navigator.mediaDevices.getDisplayMedia( {
            video: true,
            audio: true
     } )
   const blob = await stream.blob();
  const buffer = await ac.decodeAudioData(await blob.arrayBuffer());
    absn.buffer = buffer;
    capture.src = URL.createObjectURL(blob);
  })();

No.

getDisplayMedia() returns a Promise, does not have a blob() method. You can use WebRTC to connect to a remote peer. Note, Chromium implementation has a bug where getDisplayMedia() mutes the video MediaStreamTrack when "Tab" capture is used https://bugs.chromium.org/p/chromium/issues/detail?id=1099280.

WebRTC also has an example of sending a file with RTCDataChannel https://webrtc.github.io/samples/src/content/datachannel/filetransfer/.

Is the use case live streaming or file transfer?

zzph commented

"a lot" is not specific.

Well, let's say 20 remote listeners are on a single audio track. Will that mean the "streamer" has 20 connections to upload to?

Is the use case live streaming or file transfer?

This is for live streaming, but of an audio track only.

The aim is to have someone either share their system audio output (via loopback) or a browser tab.

There are different ways to achieve the goal.

See

I have been testing streaming from Firefox to Chromium to meet the "20 remote listeners" requirement using WebRTC. Does not achieve the requirement, yet. I will continue testing potential options.

Well, let's say 20 remote listeners are on a single audio track. Will that mean the "streamer" has 20 connections to upload to?

I tested using the same MediaStream and a single WebRTC PeerConnection. That resulted in only 1 remote peer receiving the stream.

I then created 20 PeerConnections and streamed the same MediaStream from Firefox as the source to Chromium.

20_WebRTC_PeerConnections_Firefox_to_Chromium

zzph commented

Were all 20 listeners receiving the audio in high (music listening) quality?

Was there any slow down or lag? Was there any server involved or was it just p2p?

thanks again

Were all 20 listeners receiving the audio in high (music listening) quality?

All 20 listeners received the audio. I streamed from Firefox to Chromium, so we have to deal with Chromium's WebRTC audio, which can exhibit sub-par quality. I did not use echo cancellation or other constraints which might improve quality.

No slow down or lag. No server involved.

Unfortunately I lost the most recent tests I was running, where I tested all 20 connections on the same page with 20 <iframe>s (the screenshot in previous post) instead of 20 tabs. I should be able to reconstruct that version from what I did save. I use async clipboard for "signaling".

I'll post one of the working examples I saved before losing the most recent tests.

Firefox, where we can capture monitor devices using getUserMedia()

offer.html

<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <script src="offer.js"></script>
  </head>
  <body>
  </body>
</html>

offer.js

(async (_) => {
  let config = { offers: [], answers: [] };
  await navigator.clipboard.writeText(JSON.stringify(config));
  let len = config.answers.length;
  const sessions = [];
  let stream = await navigator.mediaDevices.getUserMedia({
    audio: true,
  });
  const label = 'Monitor of Built-in Audio Analog Stereo';
  let [track] = stream.getAudioTracks();
  if (track.label !== label) {
    const device = (await navigator.mediaDevices.enumerateDevices()).find(
      ({ label: _ }) => label === _
    );
    const { deviceId } = device;
    console.log(device);
    track.stop();
    stream = await navigator.mediaDevices.getUserMedia({
      audio: { deviceId: { exact: deviceId } },
    });
    [track] = stream.getAudioTracks();
  }
  const createWebRTCPeerConnection = async (stream, track) => {
    // media.navigator.permission.disabled
    const webrtc = new RTCPeerConnection({
      sdpSemantics: 'unified-plan',
    });
    sessions.push(webrtc);
    [
      'signalingstatechange',
      'iceconnectionstatechange',
      'icegatheringstatechange',
      'negotiationneeded',
    ].forEach((event) => webrtc.addEventListener(event, console.log));
    webrtc.onicecandidate = async (event) => {
      // console.log('candidate', event.candidate);
      if (!event.candidate) {
        let sdp = webrtc.localDescription.sdp;
        if (sdp.indexOf('a=end-of-candidates') === -1) {
          sdp += 'a=end-of-candidates\r\n';
        }
        try {
          config = JSON.parse(await navigator.clipboard.readText());
          config.offers.push(sdp);
          await navigator.clipboard.writeText(JSON.stringify(config));
        } catch (e) {
          throw e;
        }
      }
    };
    const sender = webrtc.addTransceiver(track, {
      streams: [stream],
      direction: 'sendonly',
    });
    const offer = await webrtc.createOffer();
    webrtc.setLocalDescription(offer);
    return webrtc;
  };
  const webtrc = await createWebRTCPeerConnection(stream, track);
  try {
    async function* readClipboard() {
      while (true) {
        try {
          // dom.events.testing.asyncClipboard
          const json = JSON.parse(await navigator.clipboard.readText());
          if (json.answers.length > len) {
            console.log(json.answers.length, len);
            for (; len < json.answers.length; len++) {
              sessions[sessions.length -1].setRemoteDescription({
                type: 'answer',
                sdp: json.answers[len],
              });
            }
            await createWebRTCPeerConnection(stream, track);
          }
          yield await new Promise((resolve) => setTimeout(resolve, 1000));
        } catch (e) {
          console.error(e);
          throw e;
        }
      }
    }
    for await (const _ of readClipboard()) {}
  } catch (e) {
    throw e;
  }
})().catch(console.error);

answer.html

<!DOCTYPE html>

<html>
  <head>
    <meta charset="utf-8" />
    <style>
      body *:not(script) {
        display: block;
      }
    </style>
  </head>
  <body>
    <button id="capture">Capture system audio</button>
    <audio id="audio" autoplay controls muted></audio>
    <script src="answer.js">
    </script>
  </body>
</html>

answer.js

const audio = document.getElementById('audio');
const capture = document.getElementById('capture');
['loadedmetadata', 'play', 'playing'].forEach((event) =>
  audio.addEventListener(event, console.log)
);
const webrtc = new RTCPeerConnection({ sdpSemantics: 'unified-plan' });
[
  'signalingstatechange',
  'iceconnectionstatechange',
  'icegatheringstatechange',
  'negotiationneeded',
].forEach((event) => webrtc.addEventListener(event, console.log));

webrtc.onicecandidate = async (event) => {
  if (!event.candidate) {
    let sdp = webrtc.localDescription.sdp;
    console.log('candidate:', sdp);
    if (sdp.indexOf('a=end-of-candidates') === -1) {
      sdp += 'a=end-of-candidates\r\n';
    }
    try {
      alert('Ready');
      capture.onclick = async () => {
        capture.onclick = null;
        const json = JSON.parse(await navigator.clipboard.readText());       
        json.answers.push(sdp);
        console.log(json, await navigator.clipboard.writeText(JSON.stringify(json)));
        console.log(JSON.parse(await navigator.clipboard.readText()));
      }
    } catch (e) {
      console.error(e);
    }
  }
};
webrtc.ontrack = ({ transceiver, streams: [stream] }) => {
  console.log(transceiver);
  const {
    receiver: { track },
  } = transceiver;
  track.onmute = track.onunmute = (e) => console.log(e);
  audio.srcObject = stream;
};
onload = async (_) => {
  try {
    text = await navigator.clipboard.readText();
    const json = JSON.parse(text);
    console.log(json.offers.length);
    await webrtc.setRemoteDescription({
      type: 'offer',
      sdp: json.offers[json.offers.length - 1],
    });
    const answer = await webrtc.createAnswer();
    webrtc.setLocalDescription(answer);
  } catch (e) {
    console.error(e);
  }
};

Note, the signaling process of using clipboard is not ideal, due to any copy and paste during the process will result in unexpected values for SDP and non-JSON for JSON.parse() - unless the sender and receiver understand that copy/paste should be omitted during the stream. I just used clipboard to test, based on this gist https://gist.github.com/guest271314/04a539c00926e15905b86d05138c113c.

@zzph Is this issue resolved?