sindresorhus/file-type

Support for WebVTT files (text/vtt)

AleksandrHovhannisyan opened this issue · 2 comments

First, thanks so much for this package! We've been using it at work to validate files uploaded by users and it works as expected for the majority of our use cases. There is one edge case where it doesn't currently validate WebVTT files (MIME type text/vtt, for captions shown in a video element's <track>).

The magic numbers for VTT files are as follows according to the W3 document titled WebVTT: The Web Video Text Tracks Format:

WebVTT files all begin with one of the following byte sequences (where "EOF" means the end of the file):

EF BB BF 57 45 42 56 54 54 0A
EF BB BF 57 45 42 56 54 54 0D
EF BB BF 57 45 42 56 54 54 20
EF BB BF 57 45 42 56 54 54 09
EF BB BF 57 45 42 56 54 54 EOF
57 45 42 56 54 54 0A
57 45 42 56 54 54 0D
57 45 42 56 54 54 20
57 45 42 56 54 54 09
57 45 42 56 54 54 EOF
(An optional UTF-8 BOM, the ASCII string "WEBVTT", and finally a space, tab, line break, or the end of the file.)

Would it be possible to support this? If so, I'd be happy to help or put in a PR.

That is in my opinion in scope.

Please note that we got the BOM covered in a generic way:

file-type/core.js

Lines 251 to 255 in 988bf4b

if (this.check([0xEF, 0xBB, 0xBF])) { // UTF-8-BOM
// Strip off UTF-8-BOM
this.tokenizer.ignore(3);
return this.parse(tokenizer);
}

So ignore the magic numbers with the BOM field (EF BB BF), those will be automatically covered.

I suggest to trigger on WEBVTT, and possibly match the last character.

Thanks! That makes sense. I'll work on this and put up a PR.