VideoClips Assertion Error
Closed this issue ยท 20 comments
Hello,
I'm trying to load a big video. Following #1446 I used a VideoClips object, but it's crashing when trying to get clips with certain ids with this error:
AssertionError Traceback (most recent call last)
in ()
----> 1 x = video_clips.get_clip(1)/usr/local/lib/python3.6/dist-packages/torchvision/datasets/video_utils.py in get_clip(self, idx)
324 video = video[resampling_idx]
325 info["video_fps"] = self.frame_rate
--> 326 assert len(video) == self.num_frames, "{} x {}".format(video.shape, self.num_frames)
327 return video, audio, info, video_idxAssertionError: torch.Size([0, 1, 1, 3]) x 32
The code I use is just this:
from torchvision.datasets.video_utils import VideoClips
video_clips = VideoClips(["test_video.mp4"], clip_length_in_frames=32, frames_between_clips=32)
for i in range(video_clips.num_clips()):
x = video_clips.get_clip(i)
video_clips.num_clips()
is much bigger than the ids that are failing. Changing the clipt_length or frames_between doesn't help.
Checking the code I see [0,1,1,3] is returned by read_video
when no vframes are read:
vision/torchvision/io/video.py
Lines 251 to 254 in 85b8fbf
But, for some clip ids and clip_lengths it's just that the sizes don't match, as the assertion error is something like this
AssertionError: torch.Size([19, 360, 640, 3]) x 128
I followed the issue to _read_from_stream
and checked no AV exceptions where raised. And running this part of the function:
vision/torchvision/io/video.py
Lines 144 to 150 in 85b8fbf
I saw that for an
start_pts=32032
, end_pts=63063
it returned just one frame on frames
with pts=237237
. Which is later discarted as it's a lot bigger than end_pts
.
Also, the stream.time_base
is Fraction(1, 24000)
which doesn't match the start and end pts provided by VideoClips.
So it seems there is a problem with the seeking on my video. But it has a standard h264 encoding and I have no problem reading it sequentially with pyav.
I'm wondering if I'm doing something wrong or there might be an issue with the read_video
seeking (as the warning says it should be using seconds?).
This is the video info according to ffmpeg:
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42isom
creation_time : 2016-10-10T15:36:46.000000Z
Duration: 00:21:24.37, start: 0.000000, bitrate: 1002 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 900 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
Metadata:
handler_name : Telestream Inc. Telestream Media Framework - Release TXGP 2016.42.192059
encoder : AVC
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 93 kb/s (default)
Metadata:
handler_name : Telestream Inc. Telestream Media Framework - Release TXGP 2016.42.192059
Thanks!
Hello and thank you for the thorough analysis.
This issue seems like a corrupted file, but as you say, FFMPEG info looks ok.
Have you tried using a different backend ('video_reader' vs 'pyav')? That saved my ass in one case at least.
Best,
Bruno
I'm having a similar problem:
Traceback (most recent call last):
File "/Users/fernando/git/sudep/scripts/infer_video_kinetics.py", line 99, in <module>
sample = dataset[i]
File "/Users/fernando/git/sudep/scripts/infer_video_kinetics.py", line 74, in __getitem__
video, audio, info, video_idx = self.video_clips.get_clip(idx)
File "/usr/local/Caskroom/miniconda/base/envs/sudep/lib/python3.6/site-packages/torchvision/datasets/video_utils.py", line 367, in get_clip
video.shape, self.num_frames
AssertionError: torch.Size([6, 128, 228, 3]) x 8
I'm iterating over a Dataset
built using VideoClips
. The error happens while retrieving sample number 156 out of 174, so it's not the end of the video. For now, I just commented out the assertion, but this way I can't use a DataLoader
because the samples will have different size.
I haven't been able to try with video_reader
:
/usr/local/Caskroom/miniconda/base/envs/sudep/lib/python3.6/site-packages/torchvision/__init__.py:64: UserWarning: video_reader video backend is not available
warnings.warn("video_reader video backend is not available")
@fmassa this seems like a problem similar to what I had on my devmachine which I have attributed to the overall messiness of my conda installation and such: namely, I've had several issues where a standard install would not build the video_reader
and I'd have to
a) manually install the dependencies, and
b) build TV from source
note: often a few iterations of a) and b) before everything was working properly
@fepegar can you confirm that this is what's happening?
If so, I'll try get a clean repro for this and see if I can tackle the build system for this.
Thanks and best wishes,
Bruno
@fepegar can you confirm that this is what's happening?
I'm not sure exactly what you'd like me to confirm ๐
I'm on macOS, ran this:
$ conda create -n tv python -y && conda activate tv && pip install torch torchvision
$ python -c "import torchvision; torchvision.set_video_backend('video_reader')"
And got the above message. I'll investigate further. But I feel like this discussion should maybe move to a new issue.
My value of ext_specs
is None
here, in case it helps.
vision/torchvision/io/_video_opt.py
Line 24 in 7a36388
I just tried building from source, but I'm still not able to set the video_reader
backend.
@fmassa do you have a idea about the issue?
Hello,
We digged a bit more in this and found that setting should_buffer
to True fixes the issue:
vision/torchvision/io/video.py
Line 110 in 85b8fbf
The problem is in this section that reads the frames:
vision/torchvision/io/video.py
Lines 144 to 150 in 85b8fbf
PTS might not be read in order and this causes the break to happen before all the relevant frames have been read.
For example in our case our end_offset
is 15 but first a frame with PTS 15 is received and then one with PTS 14. So we hit the break without reading frame 14 and we crash latter on the assert for size.
It seems this can happen with AVI videos, I found this discussion on PyAV relevant PyAV-Org/PyAV#534. We confirm we are in a similar case, our AVI video has frames without PTS as it is not strictly required.
Setting the should_buffer
to true seems a good solution, is there any reason why this is set to false or not exposed as a parameter? Another solution could be doing a hard compare frame.pts == end_offset
i'm not fully sure if this always happens, but if end_offset is chosen as in VideoClips (selecting keyframes) it should work too.
Hi @mjunyent
Thanks for the investigation!
We could make should_buffer
be True
by default. This would have a small impact on runtime speed though, but might be better to do this in order to avoid those corner-case issues.
The issue I found with empty pts was due to packed b-frames in DivX, but that was the only case I found for this type of video. I agree that the handling for this is very fragile though.
If you could do some performance benchmarks comparing the runtime penalty of always setting should_buffer
to True and without, and the results are not much slower, could you send a PR setting should_buffer
to True?
Thanks!
Shall I create an issue about video_reader
not being available or you think it's been fixed in #2183?
I'm still having this issue (on Linux). I'm using version 0.6.0 and I set should_buffer = True
.
My video:
$ ffprobe 006_01_L.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '006_01_L.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf57.83.100
Duration: 00:01:52.13, start: 0.000000, bitrate: 691 kb/s
Stream #0:0(und): Video: hevc (Rext) (hev1 / 0x31766568), yuv444p(tv, progressive), 640x360, 558 kb/s, 15 fps, 15 tbr, 15360 tbn, 15 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
Should I open a new issue for this?
I still run into this issue
Still seeing this error.
Still seeing this error.
@jramapuram Could you please confirm if #5489 fixes your error?
I am running into a similar issue, where the VideoClips instance is returns exactly one more frame than expected (tested with several values).
I am using PyAV as a backend on torch=1.12.1
and torchvision=0.12.0
. Dataset is Kinetics downloaded form the S3 bucket referenced in the Kinetics dataset class.
I have no idea of how to solve this, or if it's even a problem. I could just drop the last frame, but that doesn't seem like what I should do.