Incomplete files of certain sizes play while others do not

Question

Incomplete files of certain sizes play while others do not

Closed this issue 8 years ago · 2 comments

I am working on a project that is using the vpx library to play webm content. I was testing the library's ability to deal with truncated files. I got some strange results. I started with a 1.8M file. Truncating it to 38394 bytes 'works', as in the file plays normally until it runs out of data. Truncating it to 38395 bytes causes nestegg_init to fail (so I end up getting no video at all) while truncating it to 39396 bytes also 'works'. Here is what I'm seeing with debugging messages turned on:

Opening file: /tmp/test1.webm
0x7ff9b4114520 debug: ctx 0x7ff9b4114520
0x7ff9b4114520 debug: single master element 1a45dfa3 (ID_EBML)
0x7ff9b4114520 debug: -> using data 0x7ff9b4114550 (48)
0x7ff9b4114520 debug: element 4286 (ID_EBML_VERSION) -> 0x7ff9b4114550 (0)
0x7ff9b4114520 debug: element 42f7 (ID_EBML_READ_VERSION) -> 0x7ff9b4114568 (24)
0x7ff9b4114520 debug: element 42f2 (ID_EBML_MAX_ID_LENGTH) -> 0x7ff9b4114580 (48)
0x7ff9b4114520 debug: element 42f3 (ID_EBML_MAX_SIZE_LENGTH) -> 0x7ff9b4114598 (72)
0x7ff9b4114520 debug: element 4282 (ID_DOCTYPE) -> 0x7ff9b41145b0 (96)
0x7ff9b4114520 debug: element 4287 (ID_DOCTYPE_VERSION) -> 0x7ff9b41145c8 (120)
0x7ff9b4114520 debug: element 4285 (ID_DOCTYPE_READ_VERSION) -> 0x7ff9b41145e0 (144)
0x7ff9b4114520 debug: parent element 18538067
nestegg_init status was: -1

I started looking at the code, but that is way over my head. Is there anything that can be done to more gracefully handle these files?

I'm on 64-bit debian linux, tried with tags 1.0.0 and 1.1.0. I'm hesitant to include a test file because it is encrypted, and I don't know if I can disclose the encryption key...

thanks

Answer 1 · 2013-01-22T00:56:24.000Z

Thanks for the report. Can you help me understand your use case for truncated files?

There's no specification that describes the expected behaviour of a WebM parser encountering invalid or truncated data, so each library is likely to behave differently. nestegg was designed to fail as quickly as possible, which in practice means it'll fail when the read reaches the invalid/truncated data. If the file is truncated at a valid parser sync point (e.g. at the end of a WebM element), you should be able to use nestegg without issues (caveats apply).

If it's a question of data not being available until later, the read callback supplied to nestegg is expected to either return the requested number of bytes, block until it's possible to service the request in its entirety, or return an error. There's no support provided for returning a "short" read that can later be resumed.

Answer 2 · 2016-09-25T21:30:21.000Z

Closing as incomplete.