Project Title

A example project where I test out different data engineering tools, like Apache Druid, PySPark and Dagster.

With scrapy I#m extracting some key information on various apartments, load them up into a S3 storage and run into Apache Druid.

Tech Stack

To run this project, you will need to add the following environment variables to your .env file

MINIO_USER

MINIO_PASSWORD

Install my-project with pip or poetry

  pip install -r requirements.txt

  poetry install

Clone the project

  git clone https://github.com/stejul/dataEngineeringExample

Install dependencies

  poetry install

or pip install -r requirements.txt

Start the server

To run tests, run the following command

  npm run test

To deploy this project run

  npm run deploy

wip