bwmarrin/dca

DCA2 Specification

DV8FromTheWorld opened this issue · 2 comments

Topics for discussion:

  • Duration metadata
  • Creation of a seek table for easy seeking in the file
  • Creation of a system that will allow for easy backwards compatibility for possible future versions of DCA

A general suggestion about units used in the specification - it would be good to have a consistent unit of measuring units of time in the whole specification. I propose that if possible, everything should be expressed in the number of sample groups (actual sample count is sample group count * channel count). In DCA1, both frame size and frequency are already counted in sample groups and no other fields are present with a conflicting unit.

Based on that suggestion:

  • Duration metadata
    Number of sample groups. We need to decide whether the last sample must necessarily have the same number of sample groups as all others (meaning it must match the frame_size). This determines if duration must be a multiple of frame_size or not. As for where to put it in the JSON, I am not entirely sure.

  • Creation of a seek table for easy seeking in the file
    A seek table is something that would make sense to have as a binary section, as each entry has a fixed length and it is accessed via a specific index. Either the JSON or the binary section itself would contain the seek table interval counted in number of sample groups (a multiple of frame_size) - there would be exactly ceil(duration / interval) number of entries. Each entry would contain the offset of the start of an opus frame/packet within the raw opus section. Seeking based on time would be simple - seek_position / interval gives the index of the entry to access, then the player could simply jump to that offset (opus_data_offset + entry_value) in the file. To provide more accurate seeking, it could ignore the next remainder_samples / frame_size number of packets. The interval for the seek table is up to the encoder, so I would suggest that a reference implementation would choose total length divided by a specific count, but no longer interval than for example 5 or 10 seconds.

Additionally, depending on bitrate, it would easily be possible to reach a 2gb file with less than 20 hours of audio. The first payload byte of the seek table section could indicate the size of each entry in bytes (with only 4 and 8 being valid options).

  • Creation of a system that will allow for easy backwards compatibility for possible future versions of DCA
    The most common approach for binary sections in media files is to have section type and length fields at the beginning of every section. A common way to denote a type is to have a 4-byte magic number, for example MP4 uses ASCII characters (fourCC) and MKV uses a 4-byte binary sequence for identification. An unknown section can be ignored by jumping over it, as the size is known. Also I would suggest that the total length of all binary sections is also stored in the initial header, so the start position of the raw data would just be 12 + meta_size + binary_section_size.

Open questions:

  1. Do we want CRC anywhere?
  2. Do we want to prepend the packet size to opus packets to make it simple to skip packets without decoding?

After some discussion with other people involved, this is the state of current ideas:

  • File starts with DCA2 as the magic bytes.
  • This is followed by binary sections until the end of file. Each binary section has a 16-byte header:
    • 4 bytes: section identifier fourCC
    • 4 bytes unsigned LE: CRC for this section, exact algorithm undecided
    • 8 bytes signed LE: section size excluding the header, never negative
    • Followed by payload depending on the section type, length specified in the size field.
  • The first section in the file is always META. The entire payload is one single UTF-8 string, which is valid JSON. Encoders are encouraged to pad the json with at least 4kb?! of spaces in the end, to allow easily changing some small text fields without rewriting the whole file. Format is mostly identical to DCA1, except:
    • In opus block, an additional required field total_sample_count, which is the total number of sample groups in the entire file.
  • Mandatory section: SEEK. The payload starts with a 8-byte signed LE point_interval value (always above 0 and a multiple of frame_size) which is the number of sample groups per seek point. The data of a seek point is a signed 8-byte LE integer, which is the offset of the start of the opus packet (before the opus_size field) relative to the end of the OPUS section header. This means that the nth point would point to the packet which starts with the n * point_intervalth sample group. The number of seek points is total_sample_count / point_interval.
  • The last section in the file must be OPUS. The contents are identical to what is described in DCA1.

The term sample group count is used instead of sample count because sample count depends on the number of channels - one sample group contains channel_count number of samples.

Open questions:

  1. Section case to indicate whether a decoder should throw an error if it doesn't know what it is?
  2. Optional field for packet loss&co?