This workflow can be used to extract all attachments from a bunch of .eml files.
find /path/to/folders -type f -name '*.eml' -execdir helper.sh {} ";"
helper.sh is part of this repository.
find /path/to/folders -type f -name '*.eml' -exec cp -i {} /path/to/destination \;
for file in *.eml; do
mkdir $file-Attachements
ripmime -i $file -v -d $file-Attachements
done
find . -name 'textfile*' -exec rm {} \;
find . -empty -type d -delete
After that, e. g. PDF files can be OCR'd...
find . -name '*.pdf' | parallel --tag -j 2 ocrmypdf '{}' '{}'