MediaRecorder needs to define effect of adding / removing tracks in its input MediaStream
JimBarnett opened this issue ยท 34 comments
Shijun Sun 2014-05-20 20:13:29 UTC
The "stream" attribute on the MediaRecorder is readonly, but it seems the spec does not prevent the app from dynamically adding/removing tracks from the stream object.
If the intention of defining the attribute as readonly is to block dynamic changes to the stream object, it will be better to define a getStream() method to return a copy of the internal MediaStream object.
Comment 1 Adam Bergkvist 2014-05-21 13:01:40 UTC
The readonly only prevents the script from assigning the attribute to an entirely new MediaStream. Returning a copy with getStream() wouldn't help since you can still have the reference to the stream you passed in to the MediaRecorder() constructor. To prevent this, we would have to have a "readonly mode" for the stream. I'm not sure we want/need that.
Comment 2 Martin Thomson 2014-05-21 15:24:12 UTC
The point here is that we need some sort of rules for handling the cases where tracks are added and removed from a stream that is in active use.
Comment 3 Harald Alvestrand 2014-06-13 08:41:18 UTC
Changing the subject of the bug to say what it's about (based on discussion).
The alternative would be to close as WONTFIX/WAI.
Somebody raised a use case on the Chrome launch bug to be able to switch video tracks in a single recording: https://code.google.com/p/chromium/issues/detail?id=261321#c62
I believe we should support the case where a stream have its tracks changing.
From what I remember of our discussion at TPAC, this actually might happen without the app's action when using captureStream
from a video element that has its played back tracks changing.
Chrome's CL [1] brought the question in the Title. Previous to #46, it seemed that a Track addition to/removal from a MediaStream being recorded should signal some error condition and stop the recording. The argument in said CL is that the JS should decide what to do, since the addition/removal might be desired, and even if it was not, it's unclear what the Application might deem appropriate: start recording the new track seamlessly, stop the recording altogether, stop and restart with the new Track...
I propose not being dogmatic, signal to the App the condition and let it decide what to do -- in that line of reasoning, perhaps firing an Error when a Track is added/removed might also not be a good idea. What about a new mediastreammodified
Event?
@mreavy I don't see any firing/throwing being implemented in Gecko now [2], correct?
[1] https://crrev.com/1578973002
[2] https://github.com/mozilla/gecko-dev/blob/master/dom/media/MediaRecorder.cpp
@miguelao Correct, we don't fire anything on add/remove. I'm also thinking that even firing a mediastreammodified event won't have the utility you suggest -- since it may not be processed for a while, and in the meantime we need to continue recording. So by the time the app could react, the stream may already have recorded some of the new track (likely it will have). Unless we want to make some sort of synchronous blocking interface here (or force it to slice), I don't think this gains us much. (The app could independently watch the source via other means to know that tracks were added or removed.)
If the app wants to control which tracks are recorded, it can create a MediaStream object and manually add tracks to it that it wants recorded. This is can all be done with existing APIs outside of the scope of this document, no need for adding any events.
IMHO, the MediaRecorder should just record whatever is in the MediaStream instance.
Resurrecting this; I'm trying to understand this behavior for some work I'm currently doing, and it's not entirely clear to me what should be correct here.
Playing around with [1], I observed the following behavior in Chrome 51:
- If you mute a video track, it will continue to record audio, and video goes black. If you unmute the track, video appears again in the recording.
- The
stream
property, which is aMediaStream
is read-only. However, that does not prevent me from mucking with the stream itself (adding tracks, removing tracks, etc.). - If I call
stop()
on the video track, it will continue to record audio, and video displays the last supplied frame. Settingenabled = false
on the track does not make it go black. - If I clone
mediaRecorder.stream.videoTracks()[0]
, stop the video track, and then callmediaRecorder.stream.addTrack(clonedTrack)
, then I'll see my video freeze. It does not appear to restart at any point, despite having a valid video track. Removing the stopped track before adding the cloned one also doesn't appear to change anything. - If I call
mediaRecorder.stream.removeTrack(videoTrack)
, one very surprising result is Chrome continues to record that track. I suspect that's likely a bug; I believe if aMediaTrack
is removed frommediaRecorder.stream
, that track should no longer be available to MediaRecorder as a resource.
The default behavior I'd propose for all of this would be to follow suit with muting; I think if a track is muted, stopped or removed, then MediaRecorder should simply insert blank frames. If the track is unmuted or added again, start recording from it.
One weird situation with adding a track is if something starts as audio-only/video-only, and then someone later adds a video/audio track. Handling that situation might be challenging for implementations, since most media containers will not handle this well. One option for this might be if someone specifies a mime type that contains both audio and video (e.g. video/webm;codecs=vp9,opus
), then regardless of the supplied MediaSource
, it will record audio and video from the get-go?
[1] https://webrtc.github.io/samples/src/content/getusermedia/record/
I suppose the ability to use a canvas
also changes a lot here from a spec angle. I'm going to try playing with that and see what I can do.
I lied: the canvas feature does not appear to help because the canvas isn't active when it isn't visible. Darn.
Regarding the canvas not capturing if not visible, that sounds like a Chrome implementation bug. The Media Capture from DOM Elements spec doesn't say anything about the need for the element to be active.
Anyway, regarding the manipulation of tracks for a MediaStream
attached to a MediaRecorder
object, I do believe it should just record any tracks in the stream, whether they are added / removed in the middle of recording.
Now I'm not sure if adding and removing tracks in the middle of a recording is supported by the webm container, so maybe we can pre-provision the container using a hint in the mimeType like @jnoring suggests? This does sound a bit hackish though, and doesn't solve the use case of recording a media stream captured from a VideoElement that has tracks getting added and removed outside of the app's control.
@mhofman I agree, the canvas thing seems like it's probably an implementation bug.
I don't quite follow you when you say "I do believe it should just record any tracks in the stream, whether they are added / removed in the middle of recording." Do you mean if I start with two tracks and remove one track midway through, it should continue recording the removed track? Chrome does indeed behave this way, and that feels like a bug to me: I'd expect a removed track to no longer record.
No it should add / remove track from the output when the media stream changes, mirroring everything.
Chrome continuing to record a removed track sounds like a bug too. At the very least it should mute that track in the output.
Got it--I agree. Was just making sure I understood what you were saying.
In my ideal world, MediaRecorder
would seamlessly add blank audio/video to handle all of these situations (muted/stopped/removed).
The spec reads as an error when the number of tracks changes (https://w3c.github.io/mediacapture-record/MediaRecorder.html#event-summary):
e.g. a Track has been added to or removed from the said stream while recording is occurring
and that's also the behaviour in Chrome.
I think we should define how to gracefully (not aborting or ignoring) handle track set changes, or change the API to take MediaStreamTracks instead.
@Pehrsons I think we are saying the same thing, i.e. what you propose is equivalent to making MR take a "snapshot" of the MediaStream's Tracks at construction time, but a MediaStream
is just a "bag" of Tracks ๐ -- we might just need to make very explicit that the constructor's MediaStream
is not a "live" reference, but just a handy way to enumerate the tracks.
Alternatively, we could remove the MediaStream
member variable and provide instead (weak) references to the tracks-being-recorded, wdyt?
@miguelao I think that goes semantically against the API. You hand it a MediaStream so you expect it to be able to handle it. Otherwise the API should take an array of tracks, as that is really nothing more than "a bag of tracks" (and it doesn't solve issues like the one below).
I have been hinting in a couple of places about making blobs individually playable MSE-style, as I see that solving a number of issues, including handling track changes gracefully as the most prominent one. The reason I keep pushing for it is that I haven't heard any good arguments against it. If you have some I'd love to hear them.
Take for example an api that gives you a MediaStream, that inherently from that api will change its track set. A good candidate for this is media element capture, [1].
That api gives you a MediaStream, and if the application changes the selected VideoTrack, the one VideoStreamTrack in that MediaStream will be replaced with a new one.
This is painfully difficult to record. You could record all tracks separately and remux on a server somewhere, but then you need to solve syncing. You could record audio and video together and start a new recording whenever there's a track set change, then stitch it together in a remuxing step afterwards. However if video changes, that results in a gap in audio. And so on.
We need something better.
[1] https://w3c.github.io/mediacapture-fromelement/#html-media-element-media-capture-extensions
I have naive recording needs, so I don't consider myself a stakeholder here. However, I'm biased against "What should happen?" issues. I'm a fan of use-cases driving functionality, not the other way around.
We also seem to be talking about at least two things:
- What's the least confusing API separation between live sources and a recorder?
- We want to solve problems that seem to hinge on being able to add/remove things mid-way of a recording.
On the first point, RTCPeerConnection switched from addStream
to addTrack
exactly to avoid streams becoming remote controls for peer connection functionality. MediaRecorder may be different, but I'm not hearing any reasons to think so.
But even if we were to pivot to tracks in MediaRecorder, I'd have the same questions about what kinds of modification functionality we desire.
Why cannot MediaRecorder
record the MediaStream
as long as the stream is active
? If MediaRecorder
cannot record an array of multiple MediaStreamTrack
s, MediaRecorder
should at least be able to record the only existing video and audio tracks within an active
MediaStream
.
Currently the behaviour is that MediaRecorder
only records the first video or audio track, not the tracks added subsequently with .addTrack()
.
For example
video.addEventListener("playing", e => {
let stream = captureStream(video);
let tracks = stream.getTracks();
for (let track of tracks) {
mediaStream.addTrack(track)
}
for (let track of mediaStream.getTracks()) {
if (tracks.find(({
id
}) => track.id === id) === undefined) {
mediaStream.removeTrack(track);
}
}
..
})
the MediaStream
is active
throughout the procedure, though only the first video and audio track are recorded; tried enough times MediaRecorder
might record only portions of one of the added tracks.
Similarly, if we set the .srcObject
of a media element to MediaStream
which has tracks added and removed the video and audio media is rendered in sequence at that media element in sequence, though the captured stream is not recorded in its entirely to reflect the rendered media at that media element.
@jnoring @Pehrsons Using canvas.captureStream()
and AudioContext.createMediaStreamDestination()
provides a means to record multiple video tracks and audio tracks into a single Blob
"seamlessly" at both Chromium and Firefox as proven by @Kaiido, see https://bugs.chromium.org/p/chromium/issues/attachmentText?aid=328544. This should be possible using HTMLMediaElement.captureStream()
without needing to using requestAnimationFrame
or other means to first draw the video to the <canvas>
.
The could possibly be a flag included at MediaRecorder
constructor which signals to pause
recording when a track is added (two video tracks and two audio tracks now exist in MediaStream
), when the previous tracks are removed (one video track and one audio track exist in MediaStream
) resume
recording; either internally or exposing the resume()
call to developer. Presumably MediaRecorder
already does this to an appreciable degree, as recording will not commence if no MediaStreamTrack
s exist within the MediaStream
.
Can the participants summarize their points of view on what the next steps are?
@aboba FWIW From perspective here tracks added to a MediaStream
should have the ability to be recorded by MediaRecorder
in sequence while the MediaStream
is still active
.
One use case is recording multiple media fragments with a single instance of MediaRecorder
which result in a single webm
file.
The above is possible at Firefox using multiple MediaRecorder
instances (one for each media playback that is recorded) and MediaSource
https://github.com/guest271314/MediaFragmentRecorder/blob/master/MediaFragmentRecorder.html.
However, trying to capture and recorder a MediaStream
where the underlying src
of the <video>
is a MediaSource
still crashes the tab at Chromium w3c/media-source#190 that does not crash the tab at Firefox.
I still think we need a way to support changes to the track set, since there's no way to plug video tracks in and out under a video track at will, like you can with web audio for audio tracks.
The simplest use case showing that this is needed is recording a captured video element like @guest271314 mentions, since that wouldn't survive a change of src
or changing the selected tracks.
The best proposal (clearly I'm biased, but I don't recall seeing any other proposals either) I've seen for fixing this is defining a mode for individually playable chunks [1][2][3], as this would make the track problem an MSE problem (where changing the track set is permitted AIUI). I.e., on a track set change you'd finish the existing chunk and start gathering into a new chunk. It would also fix #119 natively, and #67, in the process.
If there was a way to plug video tracks like I mentioned in the beginning, I could live without supporting track set changes. The only way currently is canvas capture but it has major drawbacks such as running on main thread, not staying in sync with audio, not rendering in background tabs or when hidden, etc. FWIW I think individually playable chunks could still be worthwhile just for the sake of #119 and #67.
As for a next step.. get the discussion going to see where people stand?
[1] #67 (comment)
[2] #67 (comment)
[3] #119 (comment)
@Pehrsons Am still not sure how the tab crash at Chromium will get printed at test results page, though the test at web-platform-tests/wpt#15816 (comment) is applicable to https://github.com/web-platform-tests/wpt/tree/master/mediacapture-fromelement, https://github.com/web-platform-tests/wpt/tree/master/media-source, and https://github.com/web-platform-tests/wpt/tree/master/mediacapture-record.
How to create a pull request which adds the test to each of the relevant wpt directories?
@Pehrsons #166. Technically concatenating recorded video is possible using canvas.captureStream()
and AudioContext()
. At Mozilla Firefox a stream from a MediaSource
can be captured, https://stackoverflow.com/questions/14143652/web-audio-api-append-concatenate-different-audiobuffers-and-play-them-as-one-son; https://stackoverflow.com/a/45343042.
If more than one HTMLMediaElement.captureStream()
is passed to new MediaStream()
with appropriate flag(s) set, instead of encoding the data immediately into a file the raw video and audio data could be "stored" (temporarily) until all of the streams have ended, then the webm
file could be created from the raw data. Or, if for example input is
let recorder = new MediaRecorder(
new MediaStream([
video0.captureStream()
, video1.captureStream()
, video2.captureStream()
]), {
concat:true
, width:<"scaleToMinimumDetected|scaleToMaximumDetected|default:maximum">
, height:<"scaleToMinimumDetected|scaleToMaximumDetected|default:maximum">
});
the webm
file can be written in .length
of input array "blocks" then timestamps, if needed, after. Not sure how consistent such an approach would be with the current intended design/implementation of MediaRecorder
.
Else, if such a proposal would conflict with the original scope of MediaRecorder
and/or .captureStream()
, a new proposal could be "incubated" which addresses the use case of recording/concatenating multiple media into a single webm
file, which optionally can be played independently; e.g. https://github.com/guest271314/OfflineMediaContext; and other similar functionality described at various repositories.
@Pehrsons In an attempt to create a version of https://github.com/guest271314/MediaFragmentRecorder which outputs the same result at both Mozilla Firefox and Chromium/Chrome substituted using canvas.catureStream()
, AudioContext.createMediaElementSource
, AudioContext.createMediaStreamDestination()
, requestAnimationFrame()
with MediaRecorder()
.
However, the code now outputs the roughly the expected result at Chromium/Chrome (save for audio clipping) though not at Mozilla Firefox, where the video does not contain all captured images, video playback stalls, and the audio has gaps. Encountered this issue once before though unfortunately no longer have access to the OS where ran and saved the tests.
If you find the time, for interop, can you run the code at both Chromium/Chrome and Mozilla Firefox and provide any feedback as to what the issue could be?
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<h1>click</h1>
<script>
const start = document.querySelector("h1");
const canvas = document.createElement("canvas");
const video = document.createElement("video");
document.body.appendChild(video);
const ctx = canvas.getContext("2d");
const canvasStream = canvas.captureStream();
const width = 320;
const height = 240;
canvas.width = video.width = width;
canvas.height = video.height = height;
video.autoplay = true;
const recorder = new MediaRecorder(canvasStream);
let raf;
recorder.addEventListener("dataavailable", e => {
console.log(e.data);
const display = document.createElement("video");
display.width = width;
display.height = height;
display.controls = true;
document.body.appendChild(display);
display.src = URL.createObjectURL(e.data);
});
recorder.addEventListener("stop", e => {
console.log(e);
});
recorder.addEventListener("resume", e => {
console.log(e);
if (raf) cancelAnimationFrame(raf);
raf = requestAnimationFrame(draw)
});
recorder.addEventListener("start", e => {
console.log(e);
if (raf) cancelAnimationFrame(raf);
raf = requestAnimationFrame(draw)
});
recorder.addEventListener("pause", e => {
console.log(e);
cancelAnimationFrame(raf)
});
const ac = new AudioContext();
const draw = _ => {
if (raf) cancelAnimationFrame(raf);
if (!video.paused) {
ctx.clearRect(0, 0, width, height);
ctx.drawImage(video, 0, 0, width, height);
raf = requestAnimationFrame(draw)
} else {
cancelAnimationFrame(raf)
}
}
const urls = [{
src: "https://upload.wikimedia.org/wikipedia/commons/a/a4/Xacti-AC8EX-Sample_video-001.ogv",
from: 0,
to: 4
}, {
src: "https://mirrors.creativecommons.org/movingimages/webm/ScienceCommonsJesseDylan_240p.webm#t=10,20"
}, {
from: 55,
to: 60,
src: "https://nickdesaulniers.github.io/netfix/demo/frag_bunny.mp4"
}, {
from: 0,
to: 5,
src: "https://raw.githubusercontent.com/w3c/web-platform-tests/master/media-source/mp4/test.mp4"
}, {
from: 0,
to: 5,
src: "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerBlazes.mp4"
}, {
from: 0,
to: 5,
src: "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerJoyrides.mp4"
}, {
src: "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4#t=0,6.5"
}];
start.addEventListener("click", async e => {
try {
await ac.resume();
console.log(ac);
const source = ac.createMediaElementSource(video);
const destination = ac.createMediaStreamDestination();
source.connect(ac.destination);
source.connect(destination);
canvasStream.addTrack(destination.stream.getAudioTracks()[0]);
await urls.reduce(async(promise, {
src, from, to
}) => {
await promise;
return await new Promise(async resolve => {
const url = new URL(src);
if (url.hash.length) {
[from, to] = url.hash.match(/\d+|\d+\.\d+/g);
}
const request = await fetch(src);
const blobURL = `${URL.createObjectURL(await request.blob())}#t=${from},${to}`;
video.addEventListener("playing", e => {
console.log(e);
if (recorder.state === "inactive") {
recorder.start();
}
if (recorder.state === "paused") {
recorder.resume();
}
}, {
once: true
});
video.addEventListener("pause", e => {
recorder.pause();
resolve()
}, {
once: true
});
video.load();
video.src = blobURL;
});
}, Promise.resolve());
recorder.stop();
} catch (e) {
console.error(e, e.stack);
}
}, {
once: true
});
</script>
</body>
</html>
@Pehrsons Found a previous version of the above code https://bugs.chromium.org/p/chromium/issues/attachmentText?aid=328544 from previous experiments at this Chromium bug Issue 820489: Capturing MediaStream from HTMLMediaElement where src is set to MediaSource crashes tab https://bugs.chromium.org/p/chromium/issues/detail?id=820489. Essentially the same issue that occurs at the above code. Audio and video are not rendered correctly (1:1 output).
@Pehrsons Finally composed two versions of code that records multiple videos in sequence using a single of each <video>
and <canvas>
elements with canvas.captureStream()
, AudioContext().createMediaStreamDestination()
, AudioContext.createMediaElementSource()
, requestAnimationFrame()
, MediaStream()
, MediaRecorder()
.
The one issue is that Mozilla Firefox does not output the final 1 second of the last recorded audio. Searched https://bugzilla.mozilla.org though could not find any similar issue. Can post the code here or at a bug report if that would be a more appropriate venue. Do you have any idea why the last 1 second of audio is not output at the resulting webm
file?
@Pehrsons Re #4 (comment) the code https://github.com/guest271314/MediaFragmentRecorder/blob/canvas-webaudio/MediaFragmentRecorder.html plnkr https://plnkr.co/edit/WaAn8v6vjn3j65RoTNs2?p=preview. Not sure how to describe the issue/bug, and not certain if the problem is caused by MediaRecorder
or AudioContext
. If you find the time could you run the code at Chromium and Firefox to confirm that the last approximately 1 second of audio is not captured/recorded/output at Firefox?
@guest271314 this is not the right venue for discussing implementation bugs. Please file a bug at our bugzilla if you see unexpected things in Firefox, and we'll take a look.
I still think we need a way to support changes to the track set, since there's no way to plug video tracks in and out under a video track at will, like you can with web audio for audio tracks.
The simplest use case showing that this is needed is recording a captured video element like @guest271314 mentions, since that wouldn't survive a change of
src
or changing the selected tracks.The best proposal (clearly I'm biased, but I don't recall seeing any other proposals either) I've seen for fixing this is defining a mode for individually playable chunks [1][2][3], as this would make the track problem an MSE problem (where changing the track set is permitted AIUI). I.e., on a track set change you'd finish the existing chunk and start gathering into a new chunk. It would also fix #119 natively, and #67, in the process.
If there was a way to plug video tracks like I mentioned in the beginning, I could live without supporting track set changes. The only way currently is canvas capture but it has major drawbacks such as running on main thread, not staying in sync with audio, not rendering in background tabs or when hidden, etc. FWIW I think individually playable chunks could still be worthwhile just for the sake of #119 and #67.
As for a next step.. get the discussion going to see where people stand?
[1] #67 (comment)
[2] #67 (comment)
[3] #119 (comment)
Composed code that utilizes RTCPeerConnection()
, RTCRtpSender.replaceTrack()
which achieves the requirement at Firefox and Chromium. For Firefox implementation played the first track 0.2
seconds, caught and handled SecurityError
when MediaRecorder.start()
is called, then replayed the first track in the array (see https://bugzilla.mozilla.org/show_bug.cgi?id=1544234#c4).
The functionality of RTCRtpSender.replaceTrack()
appears to be capable of being incorporated into MediaRecorder
, e.g., MediaRecorder.replaceTrack(withTrack)
. Am tentatively considering filing a PR for such a change to this specification (see #147 (comment); w3c/webrtc-pc#2171) though without necessitating explicitly using RTCPeerConnection()
.
Interestingly, the size
of the Blob
output by MediaRecorder
using RTCRtpSender.replaceTrack()
when recording multiple tracks of the same input media at Firefox 68 8741208
and Chromium 73 4194278
.
I think this issue needs to close. The behavior on track change is well defined (recording stops), new issues can propose changing it.