CottageLabs/OpenArticleGauge

Don't download PDFs

Closed this issue · 2 comments

They are useless, so if we can detect them and ignore them that would be best

implementation notes:

  1. check for .pdf and .PDF at end of URL
  2. check content type 'application/pdf'
  3. download first 10 KiB, check if pdf with python-magic

All of this was done in October, see linked pull request above this comment.