/big_data_benchmarks

big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.

Primary LanguageJupyter Notebook

Big data techonlogy benchmarks

This project is designed to compare big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.

The analysis is done on a 100GB Texi data.

technologies

  • Pandas
  • Vaex
  • H2O
  • Turicreate
  • Dask
  • Spark