cul-it/hfs2dfxml

Script quits when \n is in a filename

dd388 opened this issue · 3 comments

dd388 commented

Most linebreaks in filenames on testing disk images have been \r, but I have spotted one that ends in \n. Since the parser splits on \n when pulling in output from hls this causes a line that doesn't match either regular expression and the script fails.

This hopefully is a really weird edge case, but I should account for it.

Possible solution--
Add two more regexes to quickly identify the start of a file or directory line. If the file line does not end in a double quote (and or if the next line consists of just double quotes), assume this is because a line split broke before the end of the quoting of the filename. Add a double quote to the end of the string and re-parse. Skip the next line that consists of one pair of double quotes. Do the same for directories.

This is, of course, the best I can factor at the end of the day on a Friday. Maybe I'll have a better solution later.

dd388 commented

If a line does not end in a quote AND the next line is a quote, we can assume that there is an \n in that filename. I think.

dd388 commented

Cleaning up after the stray linebreak afterwards -- as implied by my suggestion to myself in May -- is not trivial.

dd388 commented

Possible solution here: https://github.com/cul-it/hfs2dfxml/tree/betterparse

Needs testing.