MPEGGroup/FileFormatConformance

RegEx to validate 4CC too strict/possible characters missing

stschr opened this issue · 3 comments

stschr commented

Issue

Identified by reviewing the "check" for #83:
The "test_validate_files"-check fails because a 4CC contains a - which is not an allowed character per the RegEx ^file(?>\\.[0-9a-zA-Z]{4})*$
see https://github.com/MPEGGroup/FileFormatConformance/actions/runs/6117640706/job/16604576692 , "Validate Output", Line33

proposed solution

Quickly scrolling through mp4ra.org, the following characters are used with registered 4CCs and should be added into the RegEx:

  • -
  • +
  • $20
  • anything else?

@podborski can you verify this as well?

From ISOBMFF:

To permit ease of identification, the 32-bit compact type can be expressed as four characters from the range 0020 to 007E, inclusive, of ISO/IEC 10646 (technically identical to the Unicode standard‎[28]) or ISO/IEC 8859-1‎[34]. Each character is hence expressible in a single byte. The four individual byte values of the field are placed in order in the file. Other fields may also use this 32-bit representation, referred to as a ‘four-character code’ (4CC). The maintenance of four-character codes used in the format is defined in ‎Annex D.

Regex should check for four characters and for the range [0x0020, 0x007E]

stschr commented

Thank you, @DenizUgur and @podborski !