All the code has been formatted by Black: The Uncompromising Code Formatter
-
Dependabot checks on weekly basis
-
After each commit GitHub workflows run the following checks:
Description: calculate pyspark aggregations from the given csv.
Tech:
- python
- spark
- csv
Description: calculate pyspark aggregations from the given parquet and csv.
Tech:
- python
- spark
- csv
Description: calculate pyspark aggregations from the given csv.
Tech:
- python
- spark
- csv
Description:
- calculate pyspark aggregations from the given parquet
- ingest the data to postgres
- read the data from postgres
- calculate pyspark aggregations and save as cvs
Tech:
- python
- spark
- parquet
- postgres in docker with persistent storage
Description:
- calculate pyspark metrics and dimensions aggregations from given json
- test the app
Tech:
- python
- spark
- pytest: 91% test coverage according to Coverage
- json/parquet
Very small task, PySpark, no sense to split it to separate functions and test them.
- remove non-ascii characters
- drop duplicates bases on
dt
column
The project itself is another GitHub repo. The purpose of the project is to prove Java, Kafka, Prometheus and Grafana knowledge.