Here, you'll find demonstrations of scalable sklearn with dask for out-of-core computation on large and complex datasets. Dask uses task graphs (which are even modifiable) to scale out computation onto disk (out-of-core). In this way both the computation and amount of data can be scaled in a big way which is really nice for ML.
It’s becoming increasingly important to scale up machine learning and deep learning computation either using a common solution in a cluster of GPUs or out-of-core computation on a single machine with enough local disk storage, which is rarely a problem these days. Dask is a new library built on python that through out-of-core processes in task graphs can handle large datasets (gbs - tbs) for resource hungry computation. It can do all this on a single PC/laptop given enough disk.
- Skimage to convert to numeric
- Standard scaling of data
- (Optional) clean up noise
- Image classification with
- MLP setup (using sklearn 0.18.dev0)
- use dask and the partial_fit for MLP
- Visualize task graph
- Try it with gridsearchcv for hyperparameter tuning