This project aims to create a simple data pipeline using Prefect to collect data from sources like REST API, FIX, WebSocket, GraphQL, and data crawlers. All collected data will be stored in PostgreSQL for further analysis and used by a prediction engine to predict price trends for trading bots. These sources include various free public data such as open-exchange-rates, Etherscan, crypto-exchanges OHLC, stock data, etc.
- After cloning this repo, cd into the root directory and create a python virtual env
python3 -m venv venv
- Once virtual env was setup, activate the virtual env.
source venv/bin/activate
- Next, you will need to install all the required packages for this.
pip install -r requirements.txt
- Next make sure to create
.env
file that contain all the credentials to your services like DB, API key etc. Here are the template for.env
file.
POSTGRES_USER='your_username'
POSTGRES_PWD='your_password'
POSTGRES_PORT='5432'
POSTGRES_DBNAME='postgres'
POSTGRES_HOST='localhost'
- We will be using dockerize posgresql database for storing the data. We will persist the data into the data folder home directory, so we will need to create this data folder folder.
mkdir -p ~/data/postgres
- Once all packages are installed and necessary folder are created, you should start your dockerize postgresdb. Please ensure the
.env
is created since we are using those creds in the docker compose.
docker compose up -d
- Now, you can ran the first flow which was to create all the required tables
python run_flow.py create-tables
- Prefect come with the local server where you can monitor your flow activities in a pretty dashboard. You can start the server by using
prefect server start
All data provided here are downloaded from public APIs or community data sources. Therefore, I do not claim ownership of the data. I have included a few CSV data dumps downloaded from these sources to help initialize the database with proper data for further analysis and usage. Continuous data downloads from the APIs need to be run manually or scheduled using a scheduler. If you need additional data dumps, please visit the hosting sites; I have included the relevant links below.