data-collectors

This project aims to create a simple data pipeline using Prefect to collect data from sources like REST API, FIX, WebSocket, GraphQL, and data crawlers. All collected data will be stored in PostgreSQL for further analysis and used by a prediction engine to predict price trends for trading bots. These sources include various free public data such as open-exchange-rates, Etherscan, crypto-exchanges OHLC, stock data, etc.

How to setup the project

After cloning this repo, cd into the root directory and create a python virtual env

python3 -m venv venv

Once virtual env was setup, activate the virtual env.

source venv/bin/activate

Next, you will need to install all the required packages for this.

pip install -r requirements.txt

Next make sure to create .env file that contain all the credentials to your services like DB, API key etc. Here are the template for .env file.

POSTGRES_USER='your_username' 
POSTGRES_PWD='your_password' 
POSTGRES_PORT='5432'
POSTGRES_DBNAME='postgres'
POSTGRES_HOST='localhost'

We will be using dockerize posgresql database for storing the data. We will persist the data into the data folder home directory, so we will need to create this data folder folder.

mkdir -p ~/data/postgres

Once all packages are installed and necessary folder are created, you should start your dockerize postgresdb. Please ensure the .env is created since we are using those creds in the docker compose.

docker compose up -d

Now, you can ran the first flow which was to create all the required tables

python run_flow.py create-tables

Prefect come with the local server where you can monitor your flow activities in a pretty dashboard. You can start the server by using

prefect server start

DISCLAIMER

All data provided here are downloaded from public APIs or community data sources. Therefore, I do not claim ownership of the data. I have included a few CSV data dumps downloaded from these sources to help initialize the database with proper data for further analysis and usage. Continuous data downloads from the APIs need to be run manually or scheduled using a scheduler. If you need additional data dumps, please visit the hosting sites; I have included the relevant links below.

inotives/data-collectors

data-collectors

How to setup the project

DISCLAIMER

Public Data Sources

Data Dump CSV - historical data

REST-API endpoints to download data

Others Infos Page (might need crawler to get the data)