Main Repo for Agile Machine Learning class.
- Clone this repo
- Optional: set up virtualenv to house this project
- Run
python tests.py
and install things including all the following: - Get copy of CrossValidated database
- Create stats database, and import CrossValidated dump
pip install -r requirements.txt
StackExchange is a network of crowd-curated q/a sites. with plenty of years behind them. That network includes everything from the very-well-known StackOverflow to a continuously updated pile of specialty sites for specific skill niches.
As a programmer, they have been the single biggest asset for finding expert-written solutions to problems I faced as I faced them. What's more exciting is that they release all their data openly. It's available through an online database interface, as a large dump in a torrent (click "Download Data Dumps" for most current version), or through an api.
We're going to be using data from the site from their CrossValidated site, known in the data at stats
. It's one of the more established not-StackOverflow sites, and also has pretty on-topic questions/answers for this class. :)
For the sake of adventure, we're going to try out Test Driven Development in our machine learning word. This will mean some combination of random data generation and finding a small set of data we know really well. You'll find a few tests within the tests
folder. To run all tests, call python test.py
from this project's root folder. Note that, as you add new files to the tests
directory, you'll have to import them into that same in-root test.py
.