Crawl, process and index your files the way you like, applying in top of them the functions you like, storing the results in Elastic. Fast, in parellel.
Take a look to the samples folder I'll try to do my best to enrich the documentation in the future (feel to help if you want!!).
The idea is simple:
- you can define file matching rules as easy as write a regex;
- on the matching file will applied whatever function you like (abstract from Extractor);
- store the results in an Elastic index.
Example 1: let's say you want to scan the disk just for jpeg files, extract for each the exif and store it in Elastic.
Example 2: you want to scan all .exe or .dll files, extract PE header for each of them and store it.
Example 3: you have several distributed machines and you want to centralize information about files in a single location
Take a look here extractors for further details on extractors.
- Add GitHub Action to properly automatically publish the package on PyPi
- Add APScheduler in order to make allow programmable rescan
- Improve (write) documentation