Request: Make download summary data public
TheChristophe opened this issue · 1 comments
TheChristophe commented
Hi,
I have a setup where my system automatically runs a script that runs PFERD to download ILIAS contents, and then runs rclone to upload everything to the cloud. Since all of this is headless, I have configured my pferd config to email me a list of new and changed files when it's done.
It's built simple:
summary = pferd._download_summary
# List[Path] -> List[str]
new_files = list(map(lambda f: str(f.relative_to(cwd)), summary.new_files))
updated_files = list(map(lambda f: str(f.relative_to(cwd)), summary.modified_files))
mail.mail_update(new_files, updated_files)
Unfortunately, for this I have to use pferd._download_summary, which is private. Is it possible to add some native method to receive a 'changelog' of sorts?
Garmelon commented
Upon successful completion, crawlers will leave a JSON file called .report
in their output directory. This file includes information about file additions, deletions and changes and shouldn't be hard to parse/use.