This project shows example of training K-means model using spark. ClickHouse was used as a data source. Predictions were stored there either.
OpenFoodFacts dataset consists of the descriptions of different food products. More info could be found here
Data was preprocessed with removing of unimportant features and null columns filling.
Consider put clickhouse-jdbc-0.4.6-all.jar in jars folder (used for clickhouse connection).