for task description, click HERE
I decided to use fast api for the api with celery as a standard approach for distributed tasks. celery is configured to use rabbitMQ as a message broker and redis as a result store.
job is split into multiple celery tasks. celery tasks are queued and linked after the creation of the task in db. celery tasks are independent from api server. I did implement callback_url I did not implement restarting of tasks in case of network errors, but it should be easy to do using celery build-in mechanisms.
Few choices that I had to make, mostly to save some time:
- if anything fails, the whole job fails. But job status should be updated and if webhook was provided, it should be called with updated status
- Use of redis as the main database - it is ok for this POC, but if I would design a real application, I would probably use postgress.
- Celery - I'm not a big fan and would change it for anything else compatible with asyncio, but it is the most common way to implement the distributed tasks, so I decided to use it.
- I would love to use airflow for the whole task, but It wouldn't show what I want to show in this task.
I also use redis as a database for the api.
app is set up and ready to use. all you have to do is :
docker compose build
and
docker compose up
or docker-compose up
deppending on your version of docker.
than go to :
http://localhost:8080/docs
I also added echo server on http://echoserver so it can be used to test callback_url
also you can check dask tasks in this tool: http://localhost:5555/tasks
I added vscode configuration, so you can connect anytime with debugger from vscode. also, I added configuration for development in remote containers. configuration for linters , etc (black, flake8 and pyright) is also there.
I did my best to make my code typed, and this is a really important part for me.
I created one integration test, to run it :
./CI.sh