/collect_phil_cdc

Collect images and metadata from phil.cdc.gov the Public Health Image Library at the Center for Disease Control

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

DEPRECATED.

This has been replaced with this: [https://github.com/gameguy43/usable_image_scraper]

The remainder of this source should only be used for reference, the above is a better implementation of the same thing.

Collect Public Health Images

Requirements

These scripts require the following modules

  • start.py
    • import Queue
    • import threading
    • import traceback
    • import urllib
    • import urllib2
  • scraper.py
    • import cookielib
    • import string
  • data_storer.py
    • import sqlalchemy
  • imginfo.py
    • import Image # requires PIL 'python image library'

Files

  • start.py
    • command script that imports scraper.py, parser.py and data_storer.py
    • contains configuration GLOBALs for directory stucture
    • cdc_phil_scrape_range(start, end) controls what ID range the script will collect