/brazil-football-datalake

Open-source, collaborative, and comprehensive dataset about football in Brazil, encompassing teams, players, matches, coaches, and more.

Primary LanguagePythonMIT LicenseMIT

Brazil Football Datalake

1. Objective

The objective of this project is to create an open-source, collaborative, and comprehensive dataset about football in Brazil, encompassing teams, players, matches, coaches, and more. Our mission is to collect and process data from all Brazilian states.

In the first version, our focus is on gathering data exclusively about football clubs. We aim to answer questions such as:

  • Which is the oldest team in each state? Which is the oldest team in the country?
  • Which is the youngest team in each state? Which is the youngest team in the country?
  • What are the most common club names?
  • How many clubs are registered in each state?
  • Can we visualize all the clubs together on a map?"

2. How to Run

You can run this project by following the instructions in the Makefile. Ensure that Docker and Python are installed on your system. The project uses the following technologies: Python, Docker, Docker Compose, Airflow.

To start the project with Airflow, execute the following commands:

make install_dependencies
make create_shared_network
make start

Once all services are running, you can access Airflow through your browser and execute the DAGs.

3. Docs

Read more about our catalog at the docs folder

4. To Do List