Improved client side video editing

By: Yash Khandelwal & Greg Whitworth

Introduction

Currently editing video within the browser is a very complex task as there isn't any straight forward approach to decoding the encoded video file to produce a raw stream that can do common video editing capabilities such as trimming or concatenation. Normally, web developers will do client side editing in three potential ways:

They will create their own client-side pipeline to decode the video file(s) to access the stream(s); do whatever edits they need to and then recode the video into the desired format.
They allow the user to do artificial edits keeping JS based data structure of the edits doing creative work to move the player position to give the illusion that the adjustments that you've made have occurred on the client. Then upon saving this document this document of events is sent to the server where the actual video editing occurs.
They will capture video content via the MediaRecorder which provides a Blob and then utilize the slice method to trim the content where desired.

All of these approaches have their pros and cons, the first one requires either knowing the video formats that the application will be working with or bundle a full zipped version of library, such as ffmpeg, in WASM to handle multiple codecs. This can result in large file sizes (at times up to 7MB zipped) to enable client side editing. This does however unlock all of the necessary needs of trimming and concatenation.

With the second approach, this likewise has the benefit of being able to handle the use cases denoted above without having to download the larger files. The negative implications of this approach is that the server side solution can produce bottlenecks in an editing queue and costs associated with having dedicated servers for doing the video edits. Additionally, this may result in numerous redundant edits in the queue since upon saving it adds the editing to the queue. This can lead to increased server side costs and a slow turn around time for the end user.

The final approach, allows you to avoid needing to download a large file or send it to a server, but it requires that the editing occurs at 1x speed. For example, if you have a 60 minute video and want to trim it to 20 minutes, you'll need to wait 20 minutes for the new blob to be created. With our early prototypes, this same work can be done in less than 3 seconds.

This API is a starting point to enable video editing on the client that not only enables the capabilities listed above without the need to handle all of above overhead for the most common web based video editing scenarios.

We have worked with Flipgrid to validate that this approach tackles their video editing needs and significantly improves their user experience.

Proposed Solution

We're proposing a MediaBlob that extends the regular blob and a MediaBlobOperation which will be used to batch the proposed media editing operations. Based on initial feedback from customers that have a need for this technology, they needed concatenation and trimming capabilities, as such that is what we started with.

MediaBlob

[Exposed=(Window,Worker), Serializable]
interface MediaBlob : Blob {
    constructor(Blob blob);
    readonly attribute long long duration;
};

Constructor MediaBlob

When the MediaBlob constructor is invoked, the User Agent MUST run the following steps:

Let blob be the constructors first argument
Run the steps in Handling MimeTypes
- If the return value is true, return the new MediaBlob
- else throw the DOMException that was returned.

Duration Property

When the duration property is called the User Agent MUST return the length of the Blob in milliseconds

let mediaBlob = new MediaBlob(blob); // blob is a Blob object for a valid media
console.log(mediaBlob.duration) // Outputs 480000 = 8 minutes

MediaBlobOperation

[Exposed=(Window,Worker), Serializable]
interface MediaBlobOperation {
    constructor(MediaBlob mediaBlob);

    void trim(long long startTime, long long endTime);
    void split(long long time);
    void concat(<Sequence<MediaBlob>);
    Promise<Sequence<MediaBlob>> finalize(optional DOMString mimeType);
};

Constructor MediaBlobOperation

When the MediaBlobOperation constructor is invoked, the User Agent MUST run the following steps:

Let mediaBlob be the constructors first argument
If mediaBlob is not undefined, return the new MediaBlobOperation
else throw a "DataError" DOMException

Batching

The MediaBlobOperation methods Trim, Concat and Split will not modify the MediaBlob when invoked. These methods will be tracked and executed only when Finalize is called. The benefit of batching these operations is to save memory and provide efficiency. Due to the nature of Split operation, it should always be the last method if called before calling Finalize.

Trim Method

The trim method is utilized to create the segment of time that the author would like to keep; the remaining content on either end, if any, is removed.

Parameter Definitions

startTime: The starting time position in milliseconds Required
endTime: The ending time position in milliseconds Required

Trim Algorithm

Let x be the byte-order position, with the zeroth position representing the first byte
Let O represent the blob to be trimmed

The User Agent will execute the following when finalize is called.

Check for errors
Move x to the startTime within O
Consume all of the bytes between the startTime and the endTime and place these bytes in a new MediaBlob object trimmedBlob

let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(240000, 360000);
mbo.finalize().then(function(mediaBlobs) {
    // mediaBlobs[0] will be the trimmed blob of 2 min duration
});

Split Method

The split method allows the author to split a blob into two separate MediaBlobs at a given time. Due to the nature of this operation, it should be the last operation before calling finalize().

Parameter Definitions

time: The time, in milliseconds, at which the blob is to be split into two separate MediaBlobs.

Split Algorithm

Let time represent the split location
Let O represent the blob to be split

The User Agent will execute the following when finalize is called.

Check for errors
Consume all of the content prior to the split location and place into mediaBlob1
Place the remaining content into mediaBlob2
Place both mediaBlob1 and mediaBlob2 into a sequence

let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.split(2000);
mbo.finalize().then(function(mediaBlobs) {
    // mediaBlobs will be an array of two MediaBlobs split at 2 seconds 
});

Concat Method

This method allows you to take two MediaBlob blobs and concatenate them.

Parameter Definitions

blob: This is the MediaBlob to concatenate with the current MediaBlob

Concat Algorithm

Let m1 represent the first MediaBlob which will be the MediaBlob from the MediaBlobOperation that has the concat method called upon
Let m2 represent the second MediaBlob which will be the MediaBlob that will be concatenated with m1

The User Agent will execute the following when finalize is called.

Check for errors
Produce a new MediaBlob and copy the bytes from m1 into this new blob, followed by m2

let mbo = new MediaBlobOperation(new MediaBlob(blob1));
mbo.concat(new MediaBlob(blob2));
mbo.finalize().then(function(mediaBlobs) {
    // mediaBlobs[0] will be a concatenated MediaBlob of blob1 and blob2 
});

Finalize Method

This method will execute all the tracked operations and return an array of MediaBlob object based on the mimeType value.

Parameter Definitions

mimeType: DOMString representation of the mimetype [RFC2046] as the expected output

Finalize Method

Let O be the MediaBlobOperation context object on which the finalize method is being called.
The User Agent will perform error checking.
If mimeType is provided, run the steps in Handling MimeTypes
- If the return value is true, continue
- else reject the promise with the DOMException that was returned.
If no errors, the User Agent will execute all the tracked operations and get a sequence of MediaBlobs.
- The operations will be executed in a sequential order in which they are added and it is up to web developers to batch the operations in the most optimized way.
- This is necessary to provide better error handling.
The User Agent will create a new sequence of MediaBlob based on the mime type provided.
Resolve the promise with the sequence of MediaBlob

// let the mimeType of the blob be 'video/webm; codecs=vp8,opus;'
let mbo = new MediaBlobOperation(new MediaBlob(blob))
mbo.finalize('video/mp4; codecs=h264,aac;').then(function(mediaBlobs) {
    // mediaBlobs[0] will be a MediaBlob object encoded with H.264 video codec and AAC audio codec
});

Example with multiple operations

let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(4000, 360000);
mbo.concat(new MediaBlob(blob2));
mbo.finalize().then(function(mediaBlobs) {
    // mediaBlobs[0] will be a concatenated MediaBlob of blob1 (which will be trimmed) and blob2 
});

Error Handling in finalize

When finalize() is called, the User Agent will perform these basic checks for the operations that are batched. This error checking should be done before executing any of the operations.

For trim()

Let O represent the blob to be trimmed
If startTime is less than 0 OR endTime is greater than the O.duration OR startTime is greater than the endTime:
- Reject promise with a "InvalidStateError" DOMException

For split()

Let O represent the blob to be split
If time is less than 0 OR is greater than O.duration OR this is not the last operation before finalize() was called
- Reject promise with a "InvalidStateError" DOMException

For concat()

Let m1 represent the first MediaBlob which will be the MediaBlob from the MediaBlobOperation that has the concat method called upon
Let m2 represent the MediaBlob that is passed in to concat method to be concatenated with m1
If the mimeType of m1 does not equal the mimeType of m2:
- Reject promise with a "InvalidStateError" DOMException

The DOMException.message must contain:

Operation name
The sequence number indicating the position of the operation

Example:

let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(0,5000);  // Trim from 0 to 5 secs
mbo.split(7000);  // Split the MediaBlob at 7 secs
mbo.finalize().then(function(mediaBlobs) { })
.catch((error) => {
    // sample error.message: "Split called on sequence 2: The time provided is greater than the duration of the MediaBlob."
});

Handling MimeTypes

The Finalize method can take a DOMString of the mime-type the author desires to have returned from the method. To determine if the mime-type is supported, do the following:

Determine the mime type of the blob by using MIME sniffing
If the mime type is not a valid mime type
OR the mime type contains a media type or media subtype that the UserAgent can not render:
- return a "NotSupportedError" DOMException
else
- return true

mimeType specifies the media type and container format for the recording via a type/subtype combination, with the codecs and/or profiles parameters [RFC6381] specified where ambiguity might arise. Individual codecs might have further optional specific parameters.

Related issues | Open a new issue

WICG/video-editing

Improved client side video editing

Introduction

Proposed Solution

MediaBlob

Constructor MediaBlob

Duration Property

MediaBlobOperation

Constructor MediaBlobOperation

Batching

Trim Method

Parameter Definitions

Trim Algorithm

Split Method

Parameter Definitions

Split Algorithm

Concat Method

Parameter Definitions

Concat Algorithm

Finalize Method

Parameter Definitions

Finalize Method

Example with multiple operations

Error Handling in finalize

Handling MimeTypes