Handling of null-terminated strings
Satyam opened this issue · 5 comments
Though string values in tags should be padded with spaces, I have found some files in my library with C-style null-terminated strings.
v1 headers use ´whiteRe´. The regexp replaces blanks or null at the end of the string, but in C, nulls might be followed by whatever trash remained in the buffer, not necessarily spaces. Thus, this regexp might concatenate whatever non-null and non-blank trash lies after the first null with the actual data. The string should be truncated at the first null with whatever remains ignored.
I had no trouble with whiteRe
in my mp3s so I wasn't able to test alternatives, but I would change the regexp to:
/(^\s+|\s+$|\0.*$)/g
Literally, one or more blanks at either end or a single null followed by anything up to the end.
v2 headers don't truncate strings at nulls at all.
stringUtils.js
provides readNullTerminatedString
. It would be good to use that. Or, possibly faster, use indexOf('\x00')
and if it is not -1
use substr(0, index)
to truncate it. Or the same whiteRe
as used for the v1 headers should work. Right now I am using the indexOf
version and it works for me.
Looks good. I will have a look as soon.
Here is ID3v1 protocol. As it says, ID3v1 would be like:
Property | Legnth |
---|---|
Song Title | 30 characters |
Artist | 30 characters |
Album | 30 characters |
Year | 4 characters |
Comment | 30 characters |
Genre | 1 byte |
Property like Artist
may does not need 30 bytes, so, it will be padding with binary 0
.
However, for lib's readUTF8String
util, it will break reading when get 0x00
. So I don't think we have to use /(^\s+|\s+$|\0.*$)/g
.
And it's great if you can offer a counterexample.
Use
let buf = fs.readFileSync('xxxxx.mp3')
buf = buf.slice(buf.length - 128)
to extract the id3v1 part.
It might be so for the v1 part, I had no issue with that one. I had a set of MP3s created by a clearly non standard-compliant software (of which, many exists) which happened to be in ID3v2, encoded with ISO8859 and null-terminated. This module included the null in the resulting string. Though it looked Ok when printed through console.log, when that string was stored in a SQLite database (not an unusual thing, MediaMonkey does that) with its automatic typecasting, it stored the string not as of type TEXT but as BLOB and that simply because of the non-visible null at the end of the string. I fixed it by stripping that null, but I wouldn't recommend my patch for this library module. A more generic fix could be to use the same regular expression you use for v1 tags for both v1 and v2, but that would have required moving whiteRe
out of the function it is declared in and have the ability to strip both blanks and nulls.
Get it. I'll assess whether to change whiteRe
.