CottageLabs/OpenArticleGauge

Don't download PDFs

Closed this issue 10 years ago · 2 comments

richard-jones commented 10 years ago

They are useless, so if we can detect them and ignore them that would be best

emanuil-tolev commented 10 years ago

implementation notes:

check for .pdf and .PDF at end of URL
check content type 'application/pdf'
download first 10 KiB, check if pdf with python-magic

emanuil-tolev commented 10 years ago

All of this was done in October, see linked pull request above this comment.