Mempool Dumpster 🗑️♻️

Dump mempool transactions from EL nodes, and archive them in Parquet and CSV format.

The data is freely available at https://mempool-dumpster.flashbots.net

Output files:

Raw transactions CSV (timestamp_ms, tx_hash, rlp_hex; about 800MB/day zipped)
Sourcelog CSV - list of received transactions by any source (timestamp_ms, hash, source; about 100MB/day zipped)
Transaction metadata in CSV and Parquet format (~100MB/day zipped)
Summary file with information about transaction sources and latency (example)

Available mempool sources:

Generic EL nodes (newPendingTransactions) (i.e. go-ethereum, Infura, etc.)
Alchemy (alchemy_pendingTransactions)
bloXroute (at least "Professional" plan)
Chainbound Fiber

Notes:

This project is under active development, although relatively stable and ready to use in production
Observing about 1M - 1.5M unique transactions per day

FAQ

What is a-pool? ... A-Pool is a regular geth node with some optimized peering settings, subscribed to over the network.
What are exclusive transactions? ... a transaction that was seen from no other source (transaction only provided by a single source)

System architecture

Collector: Connects to EL nodes and writes new mempool transactions to CSV files. Multiple collector instances can run without colliding.
Merger: Takes collector CSV files as input, de-duplicates, sorts by timestamp and writes CSV + Parquet output files.
Analyzer: Analyzes sourcelog CSV files and produces summary report.
Website: Website dev-mode as well as build + upload.

Getting started

Mempool Collector

Subscribes to new pending transactions at various data sources
Writes timestamp_ms + hash + raw_tx to CSV file (one file per hour by default)
Note: the collector can store transactions repeatedly, and only the merger will properly deduplicate them later

Default filenames:

Transactions

Schema: <out_dir>/<date>/transactions/txs_<date>_<uid>.csv
Example: out/2023-08-07/transactions/txs_2023-08-07-10-00_collector1.csv

Sourcelog

Schema: <out_dir>/<date>/sourcelog/src_<date>_<uid>.csv
Example: out/2023-08-07/sourcelog/src_2023-08-07-10-00_collector1.csv

Running the mempool collector:

# print help
go run cmd/collector/main.go -help

# Connect to ws://localhost:8546 and write CSVs into ./out
go run cmd/collector/main.go -out ./out

# Connect to multiple nodes
go run cmd/collector/main.go -out ./out -nodes ws://server1.com:8546,ws://server2.com:8546

Merger

Iterates over collector output directory / CSV files
Deduplicates transactions, sorts them by timestamp

go run cmd/merge/main.go -h

Architecture

General design goals

Keep it simple and stupid
Vendor-agnostic (main flow should work on any server, independent of a cloud provider)
Downtime-resilience to minimize any gaps in the archive
Multiple collector instances can run concurrently, without getting into each others way
Merger produces the final archive (based on the input of multiple collector outputs)
The final archive:
- Includes (1) parquet file with transaction metadata, and (2) compressed file of raw transaction CSV files
- Compatible with Clickhouse and S3 Select (Parquet using gzip compression)
- Easily distributable as torrent

Collector

NodeConnection
- One for each EL connection
- New pending transactions are sent to TxProcessor via a channel
TxProcessor
- Check if it already processed that tx
- Store it in the output directory

Merger

Uses https://github.com/xitongsys/parquet-go to write Parquet format

Transaction RLP format

encoding transactions in typed EIP-2718 envelopes:

Contributing

Install dependencies

go install mvdan.cc/gofumpt@latest
go install honnef.co/go/tools/cmd/staticcheck@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install github.com/daixiang0/gci@latest

Lint, test, format

make lint
make test
make fmt

awaisahmadfg/mempool-dumpster

Mempool Dumpster 🗑️♻️

FAQ

System architecture

Getting started

Mempool Collector

Merger

Architecture

General design goals

Collector

Merger

Transaction RLP format

Contributing

Further notes

License

Maintainers