This is a pet project to gather airline prices from their APIs and store minimum prices per route in a time window.
The project has 2 main modules. The following diagram shows the workflow:
- flights-scraper: gathers the prices from different airlines and sends them to Kafka in a common format.
- flights-streams: reads the prices from Kafka and filters minimum prices by route for a given window time.
git clone https://github.com/d1eg0/flights-price-evolution.git
cd flights-price-evolution
Install the flights-scraper Python package:
cd flights-scraper
python setup.py install
docker-compose up --build
Clean services and remove MongoDB data:
docker-compose rm -svf
rm -rf .db
and run again docker-compose.
-
Install Kafka and start the server as described in https://kafka.apache.org/quickstart.
-
Install MongoDB
Once the server is up, create the Kafka topic flights:
./kafka-topics.sh --create --bootstrap-server 0.0.0.0:9092 --replication-factor 1 --partitions 1 --topic flights
Create the collection prices in the db flights in MongoDB and the index to make faster updates:
cd scripts
mongo < flights.js
Run flights-streams to process incoming prices in 10 minutes windows:
cd flights-streams
sbt run
Running with another time window:
sbt 'run --window-duration "4 hours"'
Run flights-scraper to gather prices:
python -m scraper.run --interval 3600 --origins PMI --destinations BCN MAD VLC
flights-scraper:
cd flights-scraper
pytest
flights-streams:
cd flights-streams
sbt test