Dash-Industry-Forum/Ingest

SCTE-35 in emsg

Closed this issue · 13 comments

This issue is created based on the MPEG exploration on this topic.

It was proposed to carry scte-35 in emsg this is already supported in https://dashif-documents.azurewebsites.net/Ingest/master/DASH-IF-Ingest.html#splicing using the emsg box and the urn defined urn:scte:scte35:2013:bin ,

A separate track containing the emsg was considered more efficient for seeking and is also being standardized in MPEG

The issue with separate track is a heavier implementation load and an otherwise unsupported spec.
Another option is use of SCTE 250 (ESAM) to trigger SCTE 35 event creation using a well-supported REST API. This also lets us use SCTE 35 from sources other than the transcoder.

We know several encoders and packagers that support this so it is not an unsupported spec, and it is actually well aligned with current best practices, and also an emerging standard in MPEG.

Using REST API is fine, but does this need anything from the ingest specificatoin ? Is this not just a separate API that can be used together with the cmaf ingest protocol ? We can add a statement that out of band signalling of timed metadata is not precluded ?

Agree with Rufael here - ESAM API to trigger the splice conditioning and insertion of the SCTE35 is still going to work with the ingest protocol. If you did not want to do the SCTE35 ingest on the transcoder itself, which would be sending the emsg to the interface 1 you could also have the packager support a REST API to get out of band or ESAM generated events.

I think it would be preffer to get the SCTE35 in the emsg box and not in a track personally, and it can be sent in with the CMAF chunk that is spliced by the upstream transcoder/splice conditioning system.
However for other custom metadata (non SCTE35) that does not require splice conditioning, like ID3, JSON, whatever, you could send that as a continuous metadata track.

These are things we are also supporting today in Azure for non OTT workflows. For example live video coming from a surveillance camera would have the need for both "events" like motion detection which are sparse, and great of "emsg', and it may have a need for a metadata track as well for other telemetry, such as people detection, vehicle detection, object detection, etc. which may be a lot more constant, or arriving with every chunk of video.

We are looking at the DASH ingest interface 1 protocol for things beyond just ad signaling in OTT of course.

I would, though, allow carrying ID3v2 in emsg -- it makes little sense to separate Nielsen into a separate metadata track just to reunite it with the same segment later.

@ZmGorynych - hopefully you are also good with non ID3v2 as well... Trying not to limit anything to just that binary payload of course!

Every custom metadata scheme that clearly defines a standardized carriage in emsg should be allowed. MISB ST1910 for realtime KLV transport is a good example of it.

I think a reasonable resolution should be:
(a) Any inband events can be used as long as there is a signaling provided in the MPD;
(b) SCTE 35 shall be handled using SCTE 35 in emsg unless handled out of band (we should then indicate use of SCTE 250 in the MPD)?
(c) Nielsen and ID3v2 shall be passed as emsg
(d) Metadata track method shall not be used in this interface (implementation complexity on encoder side, especially in conjunction with multi-encoder setup) for carriage of DASH events

@ZmGorynych Points (a)/(b)/(c) make sense.

On point (d) is the Timed Metadata Track standardization finished at MPEG? I think I remember some ISO-BMFF level specification was still required. There is a real benefit to it, in terms of bandwidth optimization, with intensive messaging use cases (like KLV in ST1910) but it is a more questionable benefit with sporadic messaging use cases like SCTE-35 or ID3. Duplicating the information in the media tracks is certainly not ideal, but it's also not a huge penalty in terms of overhead.

My main worry is decoupling time-sensitive data from actual frames in while in transit in case of distributed encoders. It will still work, but be more fragile than straightforward embedding. I agree with the waste of bandwidth point, although I would expect future implementations (once Event Metadata Track is standardized and supported in clients) go and move events out of the segments into their separate track.

I am less concerned with bandwidth between encoder and packager, and to a degree chatty events can be compressed with gzip transfer coding.

I think there are still some advantages in keeping scte-35 in separate tracks

  • quick processing without scanning the media file
  • separation of metadata and media generation pipelines
  • random access (the track can for example use sidx to rapidly access timed metadata)
  • avoid the case when segment is missed, in timed metadata track metadata is repeated, in emsg if you miss it
    you are screwed
  • separate DVR window for timed metadata can be used , our experience showed that sometimes a different DVR window for timed metadata must be used, when deleting segments with emsg the metadata DVR and media segment DVR are coupled
  • the de-duplication and avoid changes to the media segments
  • allow optional third party sources to generate timed metadata and scte-35
  • allow other isobmff based processing (timeshift, timeline conversion, edit list etc)

Jan 29.
Agreement to stick to ISOBMFF/CMAF and SCTE 214, hence inband event message or timed metadata track for scte-35,
we need to carefully document the advantages and disadvantages of each, and probably also reference the new mpeg specification of event message track (create another issue for that).

AP to propose updated text for this

Call 19th february
AP ->reference MPEG timed metadata track work (CD state),
involve SCTE,
text on conditioning video boundaries needs checking (it is there) ,
ESAM to Timed metadata track

included in PR: #143