MediaArea/MediaConch

Check only files matching regex in target dir?

Opened this issue · 11 comments

Is there a way to create a rule in a policy to only check files based on, e.g., filename parameters?

When I receive files from a digitization vendor, all files are in 1 directory. I only want to run a specific policy against files ending in "_pm" or beginning with "555". Is there a way in the GUI or with the CLI to do this?

NB: I used path/to/dir/*_pm.mov in the command but MediaConch ignored the regex after the final / and ran the policy against all the files.

There's no regex handling with mediaconch but would could do some scripting around it such as:

find . \( -name "555*" -o -name "*pm" \) | while read file ; do mediaconch -p whatever.xml "$file" ; done

files ending in "_pm" or beginning with "555"

I expect that it works, this is pretty classic (relatively standard methods, actually sometimes the OS expands it directly), I'll check the reason it is not there.
But complex regex is not expected to be supported (no regex engine in MediaConch), for more complex regex I would rely on external regex scripts;

I was unable to recreate the original issue. The *_pm in the command matches files in a target directory as expected on the CLI. Thanks to @dericed for script help in the interim.

Is it possible, or desired from your perspectives, to add this 'filter' behavior to the GUI?

@kgrons I have a similar situation, only our *_pm's are bagged. We're working on a script as well but I second the interest in GUI support!

ablwr commented

This sounds like a feature worthy of sponsorship after the PREFORMA-funded phase of the project is complete.

Overall, though, this kind of nuanced file parsing seems best handled by wrapping mediaconch in a simple looping script, as Dave mentioned and wrote up above.

@ablwr Yes! That'd be great to have it as an improvement on the roadmap. And do you mean wrap the GUI in a script? Sorry, am a little confused by the second part of your comment.

It works as expected in the CLI (that was my mistake in the original issue): "The *_pm in the command matches files in a target directory as expected on the CLI."

ablwr commented

No, I mean using the CLI version of mediaconch and wrapping it in a script is the best way to go for integration into workflows.

So if I summarize correctly, you would like a regex filter in the GUI, right?

That'd be great to have it as an improvement on the roadmap

This is not part of the PREFORMA funding, so putting it in the roapmap will depend of a choice of the sponsors involved after PREFORMA (you? ;-) )

ablwr commented

Handling thousands of files is always going to be much better to do using the CLI though for performance reasons. @kgrons and @genfhk if you wrote a script, maybe you can share it??

Handling thousands of files is always going to be much better to do using the CLI though for performance reasons.

I kindly disagree, there are some possibilities with GUI too. but this is more complex and not out of the box, UI is a complex thing to be adapted by project. So for now, right, MediaConch CLI is better for that (with batch you can select the files you want), but this could be also implemented in the GUI if there is a need (but with the limits of a GUI, never more hackable as a CLI, just need of less knowledge).