t3nsor/quora-backup

Error saving files with a long filename when there are multi-byte characters

riceissa opened this issue · 3 comments

When calculating the max file length, Python's len only cares about number of characters, but the OS cares about the number of bytes, so when there's a non-ASCII character in a filename that's too long, it can't save.

For reference, the file I tried to save was Which-is-more-likely-China-emerges-as-a-xenophobic-chauvinistic-force-bitter-and-hostile-to-the-West-because-it-tried-to-slow-down-or-abort-its-development-”-or-“educated-and-involved-in-the-ways-of-the-world-more-cosmopolitan-more-internationalized-.html (note the curly quotes), which is 255 characters long, but is over 255 bytes.

I have a preliminary fix on my fork, but this only works with UTF-8.

Can you send a pull request? I think it is fine to support just UTF-8. If someone wants to add support for other encodings, they can contribute a more complicated patch. (Should be easy, right? Just add a command line flag...)

Ping.

thanks