egnor/pivid

Missing (not dropped) frames

TriviumGames opened this issue · 11 comments

I think somehow pivid is failing to find real frames that are in some video files. Earlier I made a spinning star video from some .png files with ffmpeg, and pivid was missing a frame halfway through. I have another example, again with a loop (maybe because that's where I notice the problems).

If I load this video into DaVinci Resolve and play it looped (or stick a bunch of copies one after the other on the timeline) it smoothly shows a cartoon car jauntily idling. Each loop is 22 frames long, and it's 24fps.

If I use pivid_scan_media --media test_loop.mp4, it says it's 0.917s long. 22/24 rounds up to 0.917s, so we're looking good so far. But when I write a json that loops it that long I get black screens every loop.

Digging deeper, pivid_scan_media --list_frames says

$ build/pivid_scan_media --list_frames --media ~/trivium/test_loop.mp4 
"test_loop.mp4" mov,mp4,m4a,3gp,3g2,mj2:hevc:yuv420p 1920x1080 24.00fps 0.917s 1.720Mbps
18:29:29.651  39987u I [libav] hevc[0x33ed950] Hwaccel V4L2 HEVC stateless V2; devices: /dev/media0,/dev/video19
  0.000~0.042s I  1920x1080 NV12:BCOM:66004 f18=3.0M:1.9K/136K,1.9K/2.9M KEY
  0.042~0.083s B  1920x1080 NV12:BCOM:66004 f21=3.0M:1.9K/136K,1.9K/2.9M
  0.083~0.125s B  1920x1080 NV12:BCOM:66004 f20=3.0M:1.9K/136K,1.9K/2.9M
  0.125~0.167s B  1920x1080 NV12:BCOM:66004 f22=3.0M:1.9K/136K,1.9K/2.9M
  0.167~0.208s P  1920x1080 NV12:BCOM:66004 f19=3.0M:1.9K/136K,1.9K/2.9M
  0.208~0.250s B  1920x1080 NV12:BCOM:66004 f25=3.0M:1.9K/136K,1.9K/2.9M
  0.250~0.292s B  1920x1080 NV12:BCOM:66004 f24=3.0M:1.9K/136K,1.9K/2.9M
  0.292~0.333s B  1920x1080 NV12:BCOM:66004 f26=3.0M:1.9K/136K,1.9K/2.9M
  0.333~0.375s P  1920x1080 NV12:BCOM:66004 f23=3.0M:1.9K/136K,1.9K/2.9M
  0.375~0.417s B  1920x1080 NV12:BCOM:66004 f29=3.0M:1.9K/136K,1.9K/2.9M
  0.417~0.458s B  1920x1080 NV12:BCOM:66004 f28=3.0M:1.9K/136K,1.9K/2.9M
  0.458~0.500s B  1920x1080 NV12:BCOM:66004 f30=3.0M:1.9K/136K,1.9K/2.9M
  0.500~0.542s P  1920x1080 NV12:BCOM:66004 f27=3.0M:1.9K/136K,1.9K/2.9M
  0.542~0.583s B  1920x1080 NV12:BCOM:66004 f21=3.0M:1.9K/136K,1.9K/2.9M
  0.583~0.625s B  1920x1080 NV12:BCOM:66004 f32=3.0M:1.9K/136K,1.9K/2.9M
  0.625~0.667s B  1920x1080 NV12:BCOM:66004 f20=3.0M:1.9K/136K,1.9K/2.9M
  0.667~0.708s P  1920x1080 NV12:BCOM:66004 f31=3.0M:1.9K/136K,1.9K/2.9M
  0.708~0.750s B  1920x1080 NV12:BCOM:66004 f25=3.0M:1.9K/136K,1.9K/2.9M
  0.750~0.792s B  1920x1080 NV12:BCOM:66004 f18=3.0M:1.9K/136K,1.9K/2.9M
  0.792~0.833s B  1920x1080 NV12:BCOM:66004 f24=3.0M:1.9K/136K,1.9K/2.9M
  0.833~0.875s P  1920x1080 NV12:BCOM:66004 f22=3.0M:1.9K/136K,1.9K/2.9M
  EOF

which is only 21 frames and the video does not go until 0.917s as pivid_scan_media claims, which is why there's a blank frame every cycle. If I do tell my script to loop after 0.875s, there is no black frame, but the motion has a hitch.

Attached are the movie and the trivial json with the 0.917s duration. That missing frame exists in the file, and is somehow available to DaVinci Resolve. ffprobe -show_frames also finds 22 frames where media_type=video

test_loop.zip

egnor commented

To keep ffmpeg utilities like ffprobe:

  • comment out 'programs" in the ffmpeg_without list of conanfile.py
  • comment out run_conan("remove", ...) in dev_setup.py
  • re-run dev_setup.py
  • find programs under build/.conan/data/...

Doing this, ffprobe -list_frames is still able to decode that last frame. Debug messages indicate that ffprobe is running the same hevc codec. So, it does not seem to be a bug specific to the fork of ffmpeg being used in pivid, or in the way we build it.

Next question is, what is ffprobe doing differently than we are. One obvious difference is that it's decoding all the streams, rather than skipping anything but the main video stream like pivid does, but that's probably not it since the codecs have pretty independent state...

egnor commented

Commenting out this line in media_decoder.cpp seems to fix the problem!

codec_context->thread_count = 4;

egnor commented

This may be related to this note in the documentation (FF_THREAD_FRAME is the default):
image

In fact, using FF_THREAD_SLICE but keeping 4 threads also works. But after some experimentation it seems that decoding is actually faster on the Pi (with this HEVC data, anyway) without using threads, so I just took it all out. Since we're using hardware resources for decoding, maybe that makes sense...

Anyway I believe this problem is resolved now, reopen as needed!

This is still happening. I've attached a video (which is a still turned into a 1 second long video) and the trivial json to loop it.

ffprobe -show_frames still finds 30 video [FRAME] elements, but pivid_scan_media --list_frames shows 29 frames. And they again don't add up to the reported duration of the file.

test_loop.zip

Our animator doesn't have a very consistent process, so his output is a bunch of resolutions, frame rates, and codecs. So far we have 113 files (some may be obsolete or unused), and I ran them all through pivid_scan_media --list_frames, and piped that through a script that looked for holes, and outputted CSV.

The only frames I ever noticed missing were the last or second to last frame of the file. Files with the last frame missing are almost exclusively short movies that are just a still made into an approximately ~1s long 30fps movie. Files with the penultimate frame missing fit roughly the same distribution of frame rates and lengths as the movies that were fine. This seems invariant across codec, too.

I also notice that of the videos with any missing the second to last frame, a heck of a lot of them have an odd number of frames. Like 90% of them. Of videos with the last frame missing, only 4% of them had an odd number of frames. NB these frame counts are the nominal frame count from multiplying the duration by the FPS, not the erroneous count from --list_frames

I've looked over the ffprobe --show_frames info for a few of these movies. I don't see anything special about the frames that were dropped compared to other frames in the file or between files, but I don't really understand a lot of the fields.

The data and my analysis of it are in this spreadsheet. I can extract and add more columns if you'd like.

egnor commented

Whoa that's some detailed analysis!

Let me know how high priority this is and I'll dive in accordingly. (Currently packing for a trip, but this is something I can debug remotely, assuming like the initial problem it reproduces readily.)

It's currently just a nuisance, though as you might imagine we'd very much like a fix or a work-around before we launch.

egnor commented

Oh, quite understood! Just making sure it's OK if I don't make much progress before getting back from my trip (return Oct 23).

egnor commented

Not too much progress, but:

  • Reproduced the bug given the new test_loop.zip
  • pivid_inspect_avformat can see all the packets, but pivid_scan_media doesn't see the last frame
  • Confirmed in debugging output that the last packet is sent to the codec, but the last frame is not received

Codec debugging from pivid_scan_media

21:42:20.066     12u D [libav] hevc[0x564fc92e3b00] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
21:42:20.066     20u D [libav] hevc[0x564fc92e3b00] Output frame with POC 26.
21:42:20.068   2749u D [libav] hevc[0x564fc92e3b00] Decoded frame with POC 28.
21:42:20.068      4u D [libav] hevc[0x564fc92e3b00] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
21:42:20.068     21u D [libav] hevc[0x564fc92e3b00] Output frame with POC 27.
21:42:20.072   3324u D [libav] hevc[0x564fc92e3b00] Decoded frame with POC 27.
21:42:20.072     11u D [libav] hevc[0x564fc92e3b00] Output frame with POC 28.
21:42:20.074   1919u D [libav] AVIOContext[0x564fc92cdd00] Statistics: 193324 bytes read, 0 seeks

Equivalent debugging output from ffprobe:

[hevc @ 0x55c38c962840] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id:
 0
[hevc @ 0x55c38c962840] Output frame with POC 26.
[hevc @ 0x55c38c962840] Decoded frame with POC 28.
[hevc @ 0x55c38c962840] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id:
 0
[hevc @ 0x55c38c962840] Output frame with POC 27.
[hevc @ 0x55c38c962840] Decoded frame with POC 27.
[hevc @ 0x55c38c962840] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0x55c38c962840] Output frame with POC 28.
[hevc @ 0x55c38c962840] Decoded frame with POC 29.
[hevc @ 0x55c38c962840] Output frame with POC 29.
[AVIOContext @ 0x55c38c949dc0] Statistics: 193324 bytes read, 0 seeks

So for some reason those final "Decoded frame with POC 29" / "Output frame with POC 29" aren't happening.

egnor commented

With the change above, at least on that one test file it now works:

% build/pivid_scan_media --media trivium.tmp/test_loop.mp4 --list_frames                                   ~/source/pivid ostrich
"test_loop.mp4" mov,mp4,m4a,3gp,3g2,mj2:hevc:yuv420p 1960x1080 30.00fps 1.000s 0.922Mbps
21:12:44.856 124652u W [libav] hevc[0x55e36c42bfc0] Failed to find a V4L2 device for H265
21:12:44.856     40u E [libav] hevc[0x55e36c42bfc0] Failed setup for format drm_prime: hwaccel initialisation returned error.
  0.000~0.033s I  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K KEY
  0.033~0.067s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.067~0.100s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.100~0.133s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.133~0.167s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.167~0.200s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.200~0.233s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.233~0.267s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.267~0.300s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.300~0.333s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.333~0.367s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.367~0.400s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.400~0.433s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.433~0.467s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.467~0.500s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.500~0.533s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.533~0.567s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.567~0.600s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.600~0.633s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.633~0.667s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.667~0.700s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.700~0.733s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.733~0.767s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.767~0.800s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.800~0.833s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.833~0.867s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.867~0.900s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.900~0.933s B  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.933~0.967s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  0.967~1.000s P  1960x1080 I420 2.1M/2.0K|540K/1.0K|540K/1.0K
  EOF

Reopen as needed with new failures!

(By the way, you'll probably need to re-run ./dev_setup.py after pulling.)

I haven't watched it all but rerunning my script that checks the output of pivid_scan_media --list_frames on all my video now shows every file has all the expected frames. Hooray! Thank you!
🎉