/scrapy-intel-ark

Web crawler for Intel ARK (ark.intel.com)

Primary LanguagePythonApache License 2.0Apache-2.0

scrapy-intel-ark

Web crawler for Intel ARK (ark.intel.com)

Requirements

CPU specs spider

Downloads all CPU specifications as JSON files.

scrapy crawl cpuspecs

Everything is downloaded to items/cpuspecs directory. Each CPU is in it's own socket subdirectory.

After crawling _legend.json is written which has explanations for the fields.

The prices will be written separately to cpuprices.json.

Use convert2csv.py to convert JSON data to flat CSV spreadsheet.

Notes

  • 30 day cache is used in settings.py
  • Some product information pages do not contain socket information, so they are written to _unknown/ directory