Parsing SRT data with leading whitespace
senpos opened this issue · 6 comments
Hi,
Platform: Windows 7 64-bit
Python version: 3.7.3 64-bit
Library version: 1.11.0
I am trying to parse this file (./demo.srt
) with the following code:
import srt
with open(r"demo.srt") as fd:
subs = srt.parse(fd)
for line in subs:
print(line)
I receive the following error:
D:\test_srt>d:/test_srt/venv/Scripts/activate.bat
(venv) D:\test_srt>d:/test_srt/venv/Scripts/python.exe d:/test_srt/demo.py
Traceback (most recent call last):
File "d:/test_srt/demo.py", line 5, in <module>
for line in subs:
File "d:\test_srt\venv\lib\site-packages\srt.py", line 341, in parse
_raise_if_not_contiguous(srt, expected_start, actual_start)
File "d:\test_srt\venv\lib\site-packages\srt.py", line 377, in _raise_if_not_contiguous
raise SRTParseError(expected_start, actual_start, unmatched_content)
srt.SRTParseError: Expected contiguous start of match or end of input at char 0, but started at char 2 (unmatched content: '\n\n')
pysrt
handles this file without any problems. POEdit is working with it as well.
So, I guess, srt is valid.
Would be thankful if you take a look at this and thanks for your work.
It looks like if I remove those newlines at the file beginning - everything works.
Shouldn't they be handled automatically?
There's no formal SRT spec, so there's no real definition of what's valid or not. I've never seen this particular case, so srt never learned to handle it :-)
However, it should be easy enough to add functionality to deal with it. Thanks for the report!
Platform: Windows 7 32bit
Python Version: 3.8.2
Library Version: srt 3.4.1
It still is showing this error:
raise SRTParseError(expected_start, actual_start, unmatched_content)
srt.SRTParseError: Expected contiguous start of match or end of input at char 0, but started at char 3 (unmatched content: '')
Can you help me with it?
@JafarAbbas33 If you have a new issue, please open a new issue instead of commandeering an old one. However, 
is a UTF-8 BOM in ISO-8859-1. You need to read the file with the right encoding.
Yes, you are right. Sorry. By the way for someone having the same problem, they can use something like:
with open(fname, encoding='utf-8-sig') as f: