buzz/mediainfo.js

Strange Behaviour with Large Files

Closed this issue · 6 comments

Issue Summary


I was testing with large videos (6.8gb / 11gb) through HTTP (using the code example from #79 (comment)) and noticed some odd behaviour in regards to how much mediainfo needs to download in order to provide a result.

Steps to Reproduce

When setting chunkSize: 256 * 1024 mediainfo downloads a staggering 1.2gb for the 6.8gb file (precisely 1264598157 / 6874082664)

When setting chunkSize: 70 * 256 * 1024 mediainfo gives the same result, but downloads only 18mb for the same 6.8gb file (precisely 18353801 / 6874082664)

Unfortunately, I cannot provide such a large test video, but this behaviour seems consistent with any very large video file.

If this is the expected behaviour, then how are we expected to math the chunk size setting? What i currently did to circumvent this issue (tested with a few very large files):

const response = await axios.get(url, { responseType: "stream" })
const contentLength = Number(response.headers["content-length"])

let chunkSize = Math.floor(contentLength * 0.00267)
if (chunkSize < 256 * 1024) chunkSize = 256 * 1024

mediainfo = await MediaInfoFactory({ chunkSize, format: 'object' })

But 0.00267 is a magic number, it has no reasonable logic behind it and i got to it through trial and error only.

Thanks in advance!

buzz commented

mediainfo.js is just a wrapper around MediaInfoLib. Did you test this with the MediaInfoLib? If MediaInfoLib behaves the same way, this might not originate in mediainfo.js.

MediaInfoLib should only read the file header to extract information and generally does not need to read the entire file. I think this also depends on the container/codec though.

Maybe @JeromeMartinez can chime in and shed some light.

MediaInfoLib should only read the file header to extract information

Actually it often read also ~10 seconds of video, for catching GOP size, for some formats (AVC, HEVC, ...)

For OP:

If this is the expected behaviour

No.

then how are we expected to math the chunk size setting?

chunk size should not impact parsing size, or a little, there is a bug somewhere.

Unfortunately, I cannot provide such a large test video, but this behaviour seems consistent with any very large video file.

With our own tests, we use a buffer of 64 * 1024, we can not reproduce.
With a link for tests we (MediaInfo library developers) would be fine for trying to reproduce the issue and fix it for free, with no file we invoice the time spent on trying to reproduce with our own files.

For what it's worth, I thought I had something related to this bug but instead it was a problem with the moov atom in the mp4 file that I had. I'll put this here just in case it helps the next person:

If your moov atom is located at the end of the mp4 file then it will need to download the entire file to be able to parse the media information from it. To resolve this, you can use FFMPEG to relocate the moov atom to the start of the file. For a ~300mb file, I'll typically see around a 30mb download to determine the media info about it.

I didn't encounter the chunking bug listed here but my next step is to try applying the same fix you've done to see if I see any changes in behaviour.

Finally, if it might help narrow down your bug, here's the code I'm using to read a remote video file over HTTP to analyse it. Warning, I'm not too up to speed on generators in JavaScript so this might be able to be written a bit more cleanly:

https://github.com/Rodeoclash/vodon-player/blob/main/player/src/services/videos/mediainfo.ts#L55

If your moov atom is located at the end of the mp4 file then it will need to download the entire file to be able to parse the media information from it.

For info, it is not mandatory with MediaInfo library (the library used by mediainfo.js). I don't know if it is possible with mediainfo.js but at least this is not a limitation of the underlying lib.

This issue is stale because it has been open for 30 days with no activity.

This issue was closed because it has been inactive for 30 days since being marked as stale.