This project is designed to compare big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.
The analysis is done on a 100GB Texi data.
technologies
- Pandas
- Vaex
- H2O
- Turicreate
- Dask
- Spark