Speed up PyAVReaderIndexed toc building
horsto opened this issue · 0 comments
I am using pims.PyAVReaderIndexed
to create a toc of my video file to eventually index frames precisely in my files.
The problem is that the decoding takes a very long time (too long). See also a previous mention of this #425
I can see two possible ways to gain some speed:
- threading (pyAV references this here: https://pyav.org/docs/stable/cookbook/basics.html#threading)
- build a toc and save it for later, as already mentioned here: #438, so at least it does not have to be calculated twice on the same file
For the threading part I noticed a weirdness that the last frames are dropped - I have documented the issue here PyAV-Org/PyAV#1098. I am not 100% sure this is a PIMS or a pyAV issue, this is why I am reporting it here again.
The only change I introduced was the stream.thread_type = "AUTO"
flag.
class PyAVReaderIndexed_Multi(PyAVReaderIndexed):
"""
Slightly changed version of "PyAVReaderIndexed" in pims
Adds multi thread support to toc building as described here
https://pyav.org/docs/stable/cookbook/basics.html#threading
"""
def __init__(self, file, toc=None, format=None):
if not hasattr(file, 'read'):
file = str(file)
self.file = file
self.format = format
self._container = None
with av.open(self.file, format=self.format) as container:
stream = [s for s in container.streams if s.type == 'video'][0]
stream.thread_type = "AUTO" # This is what enables multithreading
# Build a toc
if toc is None:
packet_lengths = []
packet_ts = []
for packet in container.demux(stream):
if packet.stream.type == 'video':
decoded = packet.decode()
if len(decoded) > 0:
packet_lengths.append(len(decoded))
packet_ts.append(decoded[0].pts)
self._toc = {
'lengths': packet_lengths,
'ts': packet_ts,
}
else:
self._toc = toc
self._toc_cumsum = np.cumsum(self.toc['lengths'])
self._len = self._toc_cumsum[-1]
# PyAV always returns frames in color, and we make that
# assumption in get_frame() later below, so 3 is hardcoded here:
self._im_sz = stream.height, stream.width, 3
self._time_base = stream.time_base
self._load_fresh_file()
Are there any ideas for why the packages are dropped in the multi threading case, and does anybody have suggestions on how to improve toc / decoding?