Example of K-Means clustering using PySpark
Open food facts dataset contains data about food products from all over the world. It is available on https://world.openfoodfacts.org/data
Link to csv file: https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv.gz
Clickhouse jar file is available
on https://github.com/ClickHouse/clickhouse-java/releases/download/v0.4.6/clickhouse-jdbc-0.4.6-all.jar and should be
placed in jars
directory.
First you need to install sbt package manager. Build project using following command:
bash scripts/build_datamart.sh
Jar file will be placed in datamart/target/<scala_version>/datamart_<scala_version>-0.1.0-SHAPSHOT.jar
Also this file should be placed in jars
directory.
docker-compose up