/spock-airflow

A scalable Airflow-powered ETL pipeline designed for efficient extraction, transformation, and loading of data from Ethereum, Optimism, Arbitrum, Fantom, and Polygon blockchains.

Primary LanguagePython

Spock Airflow


Spock Airflow ETL Pipeline

Welcome to the Spock Airflow project! This repository contains a complete ETL (Extract, Transform, Load) pipeline designed for data extraction and processing from five major blockchain networks: Ethereum, Optimism, Arbitrum, Fantom, and Polygon.

Leveraging Apache Airflow, our pipeline automates the entire process of extracting raw blockchain data, transforming it into a structured format, and loading it into a data warehouse or other storage solutions. This setup enables seamless data ingestion and processing across multiple blockchains, making it easier to analyze and integrate decentralized data into your applications.

Whether you're looking to analyze transaction data, monitor smart contracts, or build DeFi dashboards, the Spock Airflow project provides a robust and scalable solution to meet your needs.


Components

image

Flowchart

image

Configurator Dag

image

Builder Dag

image

Operator Dag

image

Protocol Dag

image

Folder Structure

include
├── dbt
   └── models
      └── protocol_positions
         └── [PROTOCOL_NAME]
            └── parse //for abis
            └── transform //for transformations
            └── check //for data quality checks
            └── sql //for custom UDFs

Transformation Stages

  -  Extraction //Extracting and Filtering Protocol Logs from Public Datasets.
  -  Integration //Consolidating and Merging New Data with Previously Transformed Records.
  -  Synthesis //Synthesizing the Integrated Data and Generating Wallet Positions.