miketeo/pysmb

mediainfo and pysmb

3lixy opened this issue · 4 comments

3lixy commented

hi so far i have mostly been having success with copying the first 1000 bytes and parsing that to pymedianfo.

However this does not work for all files. Do you have any suggestions how i can pass pymedianfo a file object it can read without having to copy a part of the file to the local filesystem?

mediainfo reads the metadata almost instantly when mounting the smb share via the os.

I am essentially looking for a function to return a file object but not write anything to it and let me seek it as i wish.

The below is what i do now.

from tempfile import NamedTemporaryFile
from pymediainfo import MediaInfo
from zerrphix.util.filesystem import make_dir, smbfs
file_metadata = NamedTemporaryFile()
file_metadata_path = file_metadata.name
file_attributes, bytes_written = smbcon.retrieveFileFromOffset(path, file_metadata, offset=0,
                                                                   max_length=10000, timeout=10)
media_info = MediaInfo.parse(file_metadata_path)

@3lixy : I'm afraid you can't avoid saving the file to the local filesystem.
pymediainfo wraps around a native mediainfo library, and invokes the library's open/parse/close functions to discover the media information. The open function used by pymediainfo requires a valid filename. On the other hand, from the mediainfo source codes, it looks like there are other open functions which can be used, but that will mean bypassing the pymediainfo and doing the wrapper in your own source code.

I think this is something beyond pysmb.

3lixy commented

sure. that is they way i understand how pymediainfo works.
I agree that the issue with how pymediainfo works is not a pysmb problem.
If it is ultimately required to seek through the file over smb as needed (i.e. not transferring to the local filesystem first) is this something pysmb supports? If not is this a road map feature (it is possible through the os mounting of smb).

having a quick look at pysmb sdk https://mediaarea.net/en/MediaInfo/Support/SDK/Buffers i think what is needed and would need pysmb to expose a seekable object. I am not a C++ programmer so i will leave this for future development (to be done in my project) when and if pymb ends up supporting seekable file objects.

Looking at the Buffer SDK, it appears that parameter for Open_Buffer_Continue accepts a bytes array.
pysmb does support seeking via the offset parameter in the retrieveFileFromOffset(). Furthermore, the file_obj parameter that you pass in does not necessarily have to be a genuine file object; it can be any object that supports write() method like BytesIO or StringIO. Once the BytesIO obj contains the data from the remote SMB server, your python code can extract its contents easily and sends them to the Open_Buffer_Continue function. If mediainfo requires more data, then continue to perform retrieveFileFromOffset() with an incremented offset value.

In short, if you know how to do your wrapping around the Buffer SDK, it should be possible to feed data to mediainfo without writing to the local filesystem.

3lixy commented

That sounds fair.

I use stringio to render images in memory at the moment and serve them over http.
Using dd to simulate the number of bytes needed to copy for one problematic video file, it took 50Megabytes until mediainfo would return with a video tag (if will do some more accurate tests).

Using dd took some time to copy (but i think that was because of the settings i was using bs=1).
The data acquired by medianfo is not essential for my program, it is mostly for just graphically representing resolution and deciding which video file to have as the active one when more than one video file have the same unique identifier.

I think i will go with your suggestion (keep getting more data until i get the result i want) with a limiting max value, but appending to file for now so i don't have to do C++ to wrtie my own wrapper.

Thanks for your assistance.