RegEx to validate 4CC too strict/possible characters missing
stschr opened this issue · 3 comments
Issue
Identified by reviewing the "check" for #83:
The "test_validate_files"-check fails because a 4CC contains a -
which is not an allowed character per the RegEx ^file(?>\\.[0-9a-zA-Z]{4})*$
see https://github.com/MPEGGroup/FileFormatConformance/actions/runs/6117640706/job/16604576692 , "Validate Output", Line33
proposed solution
Quickly scrolling through mp4ra.org, the following characters are used with registered 4CCs and should be added into the RegEx:
-
+
$20
- anything else?
@podborski can you verify this as well?
From ISOBMFF:
To permit ease of identification, the 32-bit compact type can be expressed as four characters from the range 0020 to 007E, inclusive, of ISO/IEC 10646 (technically identical to the Unicode standard[28]) or ISO/IEC 8859-1[34]. Each character is hence expressible in a single byte. The four individual byte values of the field are placed in order in the file. Other fields may also use this 32-bit representation, referred to as a ‘four-character code’ (4CC). The maintenance of four-character codes used in the format is defined in Annex D.
Regex should check for four characters and for the range [0x0020, 0x007E]
Thank you, @DenizUgur and @podborski !