A comparison of different libraries for dataframe manipulation and big data analytics in Clojure
Library | Description |
---|---|
tech.ml.dataset | For data processing and machine learning |
Geni | Dataframe library that runs on Apache Spark |
Onyx | High performance distributed computation system |
- Code walkthrough
- Simple examples (data export to/from SQL)
- Rankings for each library in Python / Julia / Clojure
Data format framework
- Apache Arrow (for in-memory columnar)
- Apache Parquet (for on-disk storage columnar)
- Apache Arrow vs Apache Parquet
Streaming framework