A Django project run in Docker that allows the processing of large datasets by using Celery.
The data ingestion is chunked and saved in batches to the database following a wait time between chunks.
curl --location 'http://localhost:8000/run-task/' \
--header 'Content-Type: application/json' \
--data '{
"task": "process_fifa_players_dataset"
}'
curl --location 'http://localhost:8000/male_players/'
Accessing the data before the ingestion has finished can lead to database locks as both are accessing the same table. This project is a proof of concept and is not production ready in the current state.
Add unit tests
The dataset of male players has been sourced from Kaggle The male players file has been split from the original to only contain 100MB.
The project has been formatted with black.