cdown/srt

Automatic detection of encoding

alex-left opened this issue · 1 comments

althought the current code accept via an argument the encoding of the subtitle I think it could be interesting if the program would be able to detect it automatically. I thought it could be done easily using the chardet library. Doing it also would require to use an external library so, include a requirements file (or put it in the setup.py)

I could try to find some while free to do it, but before do any PR I would like to discuss some specific details of the implementation, for example I thought a function that reads the raw input with chardet to detect the encoding and returning it with a fallback to utf-8 and implement it around line #155 in the utils.py.

what do you think?

cdown commented

Yeah, I've thought about this a few times over the past few years, but this would mean srt (well, srt_tools) starts having dependencies from a state where no such complexity exists, so it irks me a little for a feature that most people will never make use of.

If it can be implemented in a way which is -- tastefully -- an optional dependency, doesn't require reading/reopening the file twice (so probably just reinterprets a bytestream on demand), and is documented well, I'm not against it. It must not require any changes for current srt/srt_tools users, and they must not receive chardet without taking some explicit action.

The code you've highlighted is roughly the right place, but is probably too early, since the file isn't open yet. This would probably require quite a decent rework of how the encoding logic works.