/BlockchainSpider

A toolkit for blockchain data collection

Primary LanguagePython

BlockchainSpider

Blockchain spiders aim to collect data of public chains, including:

  • Transaction subgraph: the subgraph with a center of specific address
  • Label data: the labels of address or transaction
  • Block data: the blocks on chains
  • ...

For more info in detail, see our documentation.

🚀Getting Started

🔧Install

Let's start with the following command:

git clone https://github.com/wuzhy1ng/BlockchainSpider.git

And then install the dependencies:

pip install -r requirements.txt

🔍Crawl a transaction subgraph

We will demonstrate how to crawl a transaction subgraph of KuCoin hacker on Ethereum and trace the illegal fund of the hacker!

Run on this command as follow:

scrapy crawl txs.eth.ttr -a source=0xeb31973e0febf3e3d7058234a5ebbae1ab4b8c23

You can find the transaction data on ./data/0xeb3...c23.csv on finished.

Try to import the transaction data and the importance of the addresses in the subgraph ./data/importance/0xeb3...c23.csv to Gephi.

The hacker is related to Tornado Cash, a mixing server, it shows that the hacker took part in money laundering!

💡Collect label data

In this section, we will demonstrate how to collect labeled addresses in OFAC sanctions list!

Run this command as follow:

scrapy crawl labels.ofac

You can find the label data on ./data/labels.ofac, each row of this file is a json object just like this:

{
    "net":"ETH",
    "label":"Entity",
    "info":{
        "uid":"30518",
        "address":"0x72a5843cc08275C8171E582972Aa4fDa8C397B2A",
        "first_name":null,
        "last_name":"SECONDEYE SOLUTION",
        "identities":[
            {
                "id_type":"Email Address",
                "id_number":"support@secondeyesolution.com"
            },
            {
                "id_type":"Email Address",
                "id_number":"info@forwarderz.com"
            }
        ]
    }
}

Note: Please indicate the source when using crawling labels.

🧱Collect block data

In this section, we will demonstrate how to collect block data in Ethereum!

Run this command as follow:

scrapy crawl trans.blocks.web3

You can find the label data on ./data, in which:

  • BlockItem.csv saves the metadata for blocks, such as minter, timestamp and so on.
  • TransactionItem.csv saves the external transactions of blocks.

❗Important tips

If you want to get the best performance of Blockchainspider, please read the settings of APIKeys and Cache.

🔬About TRacer

Please cite our paper (and the respective papers of the methods used) if you use this code in your own work:

@misc{wu2022tracer,
      title={TRacer: Scalable Graph-based Transaction Tracing for Account-based Blockchain Trading Systems}, 
      author={Zhiying Wu and Jieli Liu and Jiajing Wu and Zibin Zheng},
      year={2022},
      eprint={2201.05757},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

Please execute the code in ./test to reproduce the experimental results in the paper.

  • parameters.py: Parameter sensitivity experiment.
  • compare.py: Comparative experiment.
  • metrics.py: Export evaluation metrics.

For more information, please refer to ./test/README.md