Data Pipeline Mini Project - Event Ticket System Case Study. Scheduling workflow using Airflow. Packaging components in Docker container.
$ git clone https://github.com/trdtnguyen/sb-miniproject4.git
$ docker-compose build && docker-compose up
- The project includes two main services: The back-end
mysql_db
and theairflow
for automation query tasks.
airflow
should wait formysql_db
ready before access to the database. We implemented that idea usingnetcat
package int theDockerfile
.airflow
andmysql_db
should share the samenetworks
in order to communiate internally.- Creating database and tables are done once when starting the container. We didn't include creating databases and tables in the workflow in this project.
- DAG workflow consists of three tasks: extract data from the CSV file and two simple query tasks.
<Task(PythonOperator): extract_data>
<Task(PythonOperator): query2>
<Task(PythonOperator): query1>
sudo apt-get update
sudo apt-get install build-essential
pip install \
apache-airflow==1.10.12 \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt"
$ airflow list_tasks ticket_event --tree
Result:
<Task(PythonOperator): extract_data>
<Task(PythonOperator): query2>
<Task(PythonOperator): query1>