CLI-equivalent field names
eugenesvk opened this issue · 8 comments
I have a Python script that parses the output of a command-line mediainfo utility
def getMediaInfo(mediafile):
cmd = "mediainfo -f --Output=JSON \"%s\""%(mediafile)
proc = subprocess.Popen(cmd, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
stdout, stderr = proc.communicate()
data = json.loads(stdout) #Decode JSON: Deserialize stdout to a Python object (object→dict)
return data
I though I'd convert it to using MediaInfo library instead as it seems like the proper way to do it.
However, I've noticed that using your script
from pymediainfo import MediaInfo
def getMediaInfo2(mediafile):
MIJSON = MediaInfo.parse(mediafile).to_json()
print(MIJSON)
gives me different field names, e.g. instead of
Encoded_Application
or UniqueID
I get
writing_application
or unique_id
After cursory reading of your plugin seems to suggest it gets information directly in XML (I've tried to use --Output=XML
command line option, but it gives the same field values as --Output=XML
), but I don't understand where these different field values are coming from
The library code also has only the Encoded_Application text, but nothing for writing_application
I've also tried to pass an extra option like so: MIJSON = MediaInfo.parse(mediafile, mediainfo_options={"Output": "XML"}).to_json()
, but this gave me no output at all
Is there a way to get from the library the same field names as the ones I get when I invoke MediaInfo directly from the command line with the XML/JSON formatting options?
I understand that my issue may have nothing to do with your wrapper and there is just some fundamental difference between the command line utility and the library that I don't get, so apologies in advance
Hi,
The library relies on the OLDXML
output format:
pymediainfo/pymediainfo/__init__.py
Line 279 in 8578863
pymediainfo/pymediainfo/__init__.py
Line 293 in 8578863
You will get more or less the same output as if you were running mediainfo -f <file>
except that field names are converted to lower-case, spaces are replaced with underscores and repeated fields are put in a other_<something>
attribute. The code is here:
pymediainfo/pymediainfo/__init__.py
Lines 67 to 78 in 8578863
This will explain the UniqueID
→ unique_id
change.
As for Encoded_Application
becoming writing_application
, it looks like the former is the internal attribute name whereas the latter is the human-readable name. See that file for details.
Apparently, Inform
formats XML
and JSON
return the internal names and OLDXML
and the default text format (empty value for Inform
) return the human-readable names.
For your use case, I'd do something like that, using text=False
to disable XML parsing:
json.loads(pymediainfo.MediaInfo.parse("tests/data/sample.mkv", text=True, mediainfo_options={"Inform": "JSON"}))
Maybe I should mention this in the documentation or add a better format
option to the parse
method that would directly pass the format to MediaInfo's Inform
parameter. What do you think?
Thanks a lot for your prompt and detailed response!
For your use case, I'd do something like that, using
text=False
to disable XML parsing:json.loads(pymediainfo.MediaInfo.parse("tests/data/sample.mkv", text=True, mediainfo_options={"Inform": "JSON"}))
This works exactly like I want it to, the output is identical to a command line command and it starts with the 'media': {'@ref':
instead of a track, so I don't need to modify any of my parsings and just use it as a drop-in replacement!!!
This will explain the
UniqueID
→unique_id
change.
I figured out a bit later the source of the underscores, but I had no idea about the following, thanks for clarifying:
As for
Encoded_Application
becomingwriting_application
, it looks like the former is the internal attribute name whereas the latter is the human-readable name.
Apparently,Inform
formatsXML
andJSON
return the internal names andOLDXML
and the default text format (empty value forInform
) return the human-readable names.
Documentation
What would've helped me is having a few examples of commands and the corresponding full output so I have the full view of the data scructure. Then I would've just copy&pasted the command that corresponds to the data output I'd like to work with (in my case, identical to what I already have)
Extra options
- I'd suggest to name this option as Output in addition to Inform as this adds familiarity to the command line users. This also seems to be the way of the library itself: this commit states that
--Output is synonym of --Inform option
- It would also be great if this option automatically enabled the Text bool as there seems to be no case when you'd need to specify Output but leave Text as default(
False
), right?
What would've helped me is having a few examples of commands and the corresponding full output so I have the full view of the data scructure. Then I would've just copy&pasted the command that corresponds to the data output I'd like to work with (in my case, identical to what I already have)
To be honest, I had no idea there was a JSON output :)
I'd suggest to name this option as Output in addition to Inform
My idea is to have an option, maybe named output
that would deprecate the text
option. Setting it to anything non-default would disable XML processing. What do you think? I would then add example of parse
's return with different values for output
.
Yes, your idea sounds great, the new output
option does look like the brighter future for the text
!
@JeromeMartinez Hi Jérôme, is there a value for Output
/ Inform
that corresponds to the default output format or should I simply ask users to set it to ""
? I noticed that mediainfo --Inform=Text
works too, but then again so does --Inform=randomtexthere
:)
is there a value for Output / Inform that corresponds to the default output format or should I simply ask users to set it to ""?
"" is the more or less official way to say to reset to default.
random text (including "Text") is discarded and default is used too, but there could be a message error in the future.
@eugenesvk can you please try the new output
branch and let me know if the new documentation is fine as well?