Scrape, record, and analyze television data from online retailers
This is mostly a collection of scripts for scraping television data, recording that data, and analyzing those records across time.
❯ git clone git@github.com:spejamchr/tv_prices.git
❯ cd tv_prices
❯ bundle
# Scrape current television data
❯ ruby scripts/scrape.rb
- Ruby ~> 2.6
There are several different scripts, which use the internal tools in lib/
.
- Scrape
- Reformat
- Analyze
❯ ruby scripts/scrape.rb
This creates a cache of the HTML/JSON files so that developing the scripts can be fast & repeatable. In order to scrape some more fresh data delete the cache directory and rerun the scrape script:
❯ rm -rf cached_pages
❯ ruby scripts/scrape.rb
❯ ruby scripts/reformat_history.rb
If the format of the exported .csv
files changes this can help convert all
historical data to the same format. This is not very battle-hardened, so
tweaking may be necessary depending on the nature of the change in format.
❯ ruby scripts/analyze_history.rb
Run some simple analysis on the historical records.
Stores can be scraped either in HTML or JSON. Adding a new store to scrape is
done entirely within config/application.yml
, and takes about a dozen
lines of code. Currently scraped stores are:
- Amazon (HTML)
- Best Buy (HTML)
- Costco (HTML)
- Ebay (HTML)
- Fry's (HTML)
- Newegg (HTML)
- Overstock (JSON)
- Target (JSON)
- Walmart (JSON)
All code and data are released under the terms of the MIT License.