A example project where I test out different data engineering tools, like Apache Druid, PySPark and Dagster.
With scrapy I#m extracting some key information on various apartments, load them up into a S3 storage and run into Apache Druid.
- Scrapy
- Dagster
- Apache Druid
- Docker
- Apache Superset
- Jupyter Notebook
- Min.io (https://min.io/)
- PySpark
To run this project, you will need to add the following environment variables to your .env file
MINIO_USER
MINIO_PASSWORD
Install my-project with pip or poetry
pip install -r requirements.txt
Or
poetry install
Clone the project
git clone https://github.com/stejul/dataEngineeringExample
Install dependencies
poetry install
or pip install -r requirements.txt
Start the server
To run tests, run the following command
npm run test
To deploy this project run
npm run deploy
wip