/TapEth

Primary LanguageJava

TapEth

TapEth (Technologies for Advanced Programming - ETHereum) is a university project. The main goal of this project is to statistically analyze pending transaction on one Ethereum network (e.g. the main network) to predict the estimated waiting time for each transaction before being mined in a block.

N.B. Some large files, like Kafka or Spark packages, are hosted in GitHub using git-lfs.

Technologies used

The data pipeline is composed of the following steps:

Step Technology used
Data ingestion Apache Kafka Connect
Data streaming Apache Kafka / Apache Spark Streaming
Data processing Apache Spark / Apache Spark MLlib
Data indexing ElasticSearch
Data visualization Kibana


How TapEth works

The estimation is based on the gas price of the pending transaction. The gas price represent how much the user is disposed to spent for the transaction, so higher values of gas price are more catchy for miners that will mine the transaction sooner.

TapEth uses Infura that basically is service that provides a geth node (exposing the Ethereum JSON-RPC api). Through the pub/sub pattern, using the JSON-RPC api and websockets, it's possibile to subscribe to events like new blocks mined or incoming pending transactions; for more information, visit the pub/sub documentation of geth.

In brief:

  1. Pending transactions are acquired from Kafka Connector and written to a Kafka Topic. For more information, visit the Kafka section of this project.
  2. Spark Streaming reads from the Kafka Topic, then Spark MLlib processes the incoming data using machine learning and lastly sends the data to ElasticSearch. For more information, visit the Spark section of this project.
  3. Finally Kibana reads indexes from ElasticSearch and exposes a beautiful graphical interface.

How to start TapEth

You can start it easily with docker-compose using docker-compose up from the root of this project folder.