fizwit/filesystem-reporting-tools

Unicode in file names

fizwit opened this issue · 1 comments

pwalk discards files which have Unicode UTH-8 characters in the file name. I have changed this filter to allow all unicode characters in file names to pass through to the output. WARNING if you downstream use of pwalk includes a database this could break-bulk loading of data.

The old functionality of removing Unicode was implemented to allow bulk loading into a database which only supports ASCII characters. Our database backends have improved and now full support UTH-8.

Unicode characters in filenames are now supported. Only invalid characters for files are control characters and the Null.