Metadata
meew0 opened this issue ยท 30 comments
We've discussed on Discord that DCA files should contain metadata as the first packet - the very first byte will specify the metadata length and the rest of the first packet will contain a JSON of the metadata. Then, starting with the second packet, opus will resume as normal. Here are some metadata fields I propose:
version
- a number that gets incremented every time something changes in the actual DCA file format, starts at1
rev
- a specifier that determines which revision of dca was used to generate this (for examplebwmarrin/dca#32361ee92fcbd0e404b2be18adf497a45fef4a5f
)tool
- if a separate tool was used to generate this DCA file, this field contains information about what tool and the revision in the same format asrev
source
- where this DCA file was generated from (file
,stdin
or other things in the future)abr
- the audio bitrate in kbit/surl
- if the track was downloaded from somewhere, the URL where it can be foundtitle
(self-explanatory)author
(self-explanatory)
Furthermore, they-
prefix can be reserved for information copied over from youtube-dl's JSON file, and thex-
prefix can be reserved for other fields the particular tool/library needs.
Added in response to 188480082 and 189713886:
sample_rate
- Opus audio sample rate in kbit/sapplication
- Opus audio application -voip
/music
/lowdelay
. I chose application over mode because it's more commonly used everywhere elsevbr
- Opus variable bitrate settingframe_size
- Opus frame size in bitschannels
- Number of audio channelsalbum
/genre
/track_num
(self-explanatory)
As for ffmpeg metadata, maybe anf-
prefix for those specific fields?
All of these are just suggestions and most are probably useless, please suggest others and comment on existing ones!
Great ideas. Probably want to add fields for the opus encoder options used too, most importantly will be the frame size since this effect timing.
- opus audio sampling rate
- opus application setting (voip, music, lowdelay)
- opus vbr setting
- frame size (960, 1920, 2880)
Also some way to store cover image as URL or maybe inline
http://wiki.multimedia.cx/index.php?title=FFmpeg_Metadata
That might be relevant here. I could also have dca
pull the metadata of the input file with ffmpeg so it can be moved into the output metadata.
I think the idea of this metadata is fantastic but I believe there should also be a Magic Byte header at the very beginning of the file so that it is easy to tell that we are in fact dealing with DCA audio. An example of this would be the WAV file magic byte header which is WAVERIFF as 8 bytes.
I propose DISCORDAUDIO as a 12 byte magic byte header.
Additionally, a single byte can only specify a length of 255, so I would suggest using 2 bytes (as a short) to specify the amount of bytes in the header. Please use a signed short, not an unsigned short. A signed short still allows for a max of 32,767...which should be plenty. I ask for a signed short because some languages don't have unsigned data types coughcoughjavacoughcough.
So, ideally:
[D][I][S][C][O][R][D][A][U][D][I][O]
[Length_1][Length_2]
[..Json String as byte array ..]
[.. DCA opus audio packets ..]
For the JSON, I propose the following
{
"dca": {
"version": 1,
"tool": {
"name": "dca-encoder",
"version": "1.0.0",
"rev": "bwmarrin/dca#32361ee92fcbd0e404b2be18adf497a45fef4a5f",
"url": "https://github.com/bwmarrin/dca/",
"author": "bwmarrin"
}
},
"info": {
"title": "Out of Control",
"artist": "Nothing's Carved in Stone",
"album": "Revolt",
"genre": "jrock",
"comments": "Second Opening for the anime Psycho Pass"
},
"origin": {
"source": "file",
"abr": 192000,
"channels": 2,
"encoding": "MP3/MPEG-2L3",
"url": "https://www.dropbox.com/s/bwc73zb44o3tj3m/Out%20of%20Control.mp3?dl=0"
},
"opus": {
"sample_rate": 48000,
"mode": "voip",
"frame_size": 960,
"channels": 2
},
"modified_date": 1456602080731,
"creation_date": 1456602080731
}
modified_date and creation_date are milliseconds past epoch.
I wouldn't use nested JSON for the metadata so parsing and processing it is simpler, rather have it all in one level and if necessary use prefixes. I agree with magic bytes, but I wouldn't put it before any other data - rather put it into the first packet with the other metadata so the parsing doesn't get more complicated than skipping the first packet (or not doing so and getting 20ms of garbage). DISCORDAUDIO
seems excessive, how about DCA
?
Also, in regards to packet headers, they're already two bytes each and already unsigned... why not just use ints to store them?
I'll update my first post with some new metadata fields. I don't think we need modified and creation date as the file system already does this.
Looks good, however we wouldn't be able to get origin.url
correct?
DISCORDAUDIO
is a bit excessive.
I believe that nesting the JSON doesn't really change it's processibility but with the nested encapsulation of data it makes more sense. Using json prefixes in json instead of nesting isn't smart. Less intuitive.
I will admit that I have no clue as to how DCA works or stores its audio currently, so I have no concept of the packet storage system. However, with other file types like WAV, the first 8 bytes of the file are the Magic Header. This is useful because it means that you can read the minimum amount of the file to determine while kind of file it is. This is why I would suggest making the very first few bytes be the Magic Header.
I chose DISCORDAUDIO
over DCA
because the likeliness of a different random file possibly having 3 bytes matching DCA
is quite likely, but likeliness of a file matching DISCORDAUDIO
and it not being a DCA file is incredibly low. 9 more bytes really isn't that wasteful IMO.
JSON looks good as it was posted.
Possibly just DISCORD
?
DISCORD
is an acceptable alternative.
The packets are stored using a simple two-byte header specifying the length of the opus packet. It doesn't really matter whether those magic bytes are at the very beginning, because the fixed length header means you can always read bytes 2 to (N+2) where N is the length of the magic string.
DISCORD
or DISCORDAUDIO
don't really matter because the probability of three random bytes being DCA
is 256โปยณ = 1/16777216, so negligibly small.
That isn't what a magic header is supposed to do. It specifies at the very beginning so you can read the least amount of bytes possible and also allows you to assume nothing about the file until you've confirmed that it IS the file type you want.
Also, it is just as easy to read 2+ (N + 2) as it is to read the first N bytes and if it is the correct file type, skip MAGIC_HEADER.length bytes. Honestly, IMO, it is easier. Additionally, if we chose your system of putting the magic header inside the packet it would mess up the json. Or, we would need to know to skip the magic header bytes inside the packet instead of just relying on the header length value defined in the first 2 bytes of each packet when loading the json.
I don't see a real reasoning not to put the magic header bytes at the beginning. If they want to check if it is actually a DCA file, they scan the first MAGIC_HEADER.length bytes, otherwise they just skip MAGIC_HEADER.length bytes and assume that the next 2 bytes they read will be the 2 bytes defining the length of the opus packet.
A comment about the DCA
vs DISCORD
vs DISCORDAUDIO
, I've seen short magic headers be confused before. It is just a few extra bytes, I don't think it is really that much of a problem to go with DISCORD
. It is a good median between DCA
and DISCORDAUDIO
The problem with putting magic bytes at the very beginning is that you have to introduce special code to read it, and existing code will break on it because it will read D
amount of data which will of course be garbage. If breaking the JSON is a problem, it is of course a possibility to make two header packets - one with the magic bytes and one with the metadata. DISCORD
doesn't fit the format in my opinion because it's mainly audio data not Discord data. If you take a look at existing magic bytes you'll see that they're usually 2-4 bytes long, with longer ones being the exception.
Just wanted to jump in here and say.. I am reading all of this and I really appreciate the conversation and all the helpful ideas. I'm a bit torn on some of what's being debated so I need to think about it a bit :). Please keep throwing in ideas though!
I'll drop the discussion about DCA
vs DISCORD
or DISCORDAUDIO
. Your argument is pretty sound so I'll be okay with whatever is picked for that.
About the need for new special code: currently do people already skip the first packet of audio? If they Dont then just the implementation of metadata will require changes to be made to code in order to skip the first packet otherwise it will send the metadata as if it were an audio packet.
That being said.. after I thought about it for a while, if we stored it inside the packet it would make a streaming system easier because the metadata packet could be sent as the first packet, regardless of the stream' current read location.
Having to skip the first packet doesn't really break existing files/code because 1. the first couple packets of audio files are generally zero anyway and 2. if existing code reads the metadata packet as an audio packet, either opus will fail to decode it or it will be decoded to garbage PCM - 20 ms of noise at the beginning doesn't really matter.
Okay, I see your point. 20ms of garbage data vs all packets being garbage because it would be reading the wrong bytes for packet size.
So, in conclusion. we are looking at storing a 3 byte magic header in the first packet of the DCA audio file. The magic header will be directly following the initial 2 bytes which specify packet size. This means that you will skip the first 2 bytes, read the next 3 to determine if it is the DCA audio and consequently the metadata packet, then using value stored in the initial 2 bytes, you will load byte 5
through packet.length
when loading the json.
I still stand by my statement about the json needing to be nesting. It feels cleaner and is easier to work with IMO.
So, I've been thinking about this and looking at how it's handled elsewhere...and...
Right now, I think the focus should be what's the best way to do this without regard to how the file works currently. There's only 2-3 people that use the dca format so if I make a major change that requires their code to be updated I'm okay with that :) I, however, don't want to decide not to make the best change now and then need to do so a year from now when 20 or 100 people are using it.
Personally, I think the magic byte header should be the very first bytes, and I propose using DCA1
and in case there is ever a huge and very backwards incompatible change we can use DCA2
and so forth. I don't expect that to happen ever, let alone frequently but at least it sets us up for the future possibility and as a four byte header it gives a decent level of protection from confusing file formats.
Next, I propose an int32 value that represents the length of the JSON data. This allows a very large chunk of JSON but most importantly it gives just enough extra space over int16 that we could include 300x300 base64 encoded or so sized thumbnail art which is something I specifically want for my own purposes and it's fairly expected of any audio file format at this point.
After that, I say we continue the existing format, except use an int16 length header to be more compatible with all languages cough java cough and an int16 is still plenty large enough for the max length an opus packet would ever be.
Also, I think the json format should be nested. I think it's easier to read in that format for one, but also it creates cleaner code in Go for me so, there's my selfish personal reasons again :) It also gives more freedom for future format changes and even custom fields individuals choose to add into the files. The specific format I'm not settled on and I think I need to think about that a bit longer..
I know this means changes for everyone but I think it gives us a better format for the long term future. The initial DCA1 and JSON metadata can be stripped before being passed to your libraries and then they wouldn't even need to deal with them. I can provide a -raw
option on the dca program itself to do this as well.
The problem is that not only will existing DCA parsers break, existing files will break too, which might be an even bigger problem.
I understand that, but.. How many of both do we have right now? There's what, 2-3 parsers? Beyond that, who is already storing a large collection of .dca
files on disk? I don't have any at all but if someone is doing that I could absolutely write a small script that would convert them to include the new header.
Only libs that use it at the moment is DiscordPHP and discordrb, it's a couple lines to ignore the header and JSON. Not a huge issue.
I don't know anyone that stores DCA files. Doubt anyone actually does, and @bwmarrin can convert them if needed.
I've started working on the JSON metadata over here: https://github.com/uniquoooo/dca/tree/magic-bytes
@bwmarrin possibly create a seperate branch so I can merge in and you can fix all my mistakes?
After further discussion on Discord, this initial draft of the specification was created:
Please comment on it with specific things you'd like to have changes, or that should be discussed.
Some problems that were mentioned and that should be discussed specifically:
- The
opus.abr
setting doesn't make much sense with variable bitrate encoding. TheRemoved in revision 3dca.tool.rev
field may be too hard to provide because it would require a git repository parser.
The metadata will need a field for additional arbitrary data. AnAdded in revision 2extra
object was suggested that will never be used internally by DCA.