Don't download PDFs
Closed this issue · 2 comments
richard-jones commented
They are useless, so if we can detect them and ignore them that would be best
emanuil-tolev commented
implementation notes:
- check for .pdf and .PDF at end of URL
- check content type 'application/pdf'
- download first 10 KiB, check if pdf with python-magic
emanuil-tolev commented
All of this was done in October, see linked pull request above this comment.