/chartalist

Sponsored by the Canadian NSERC Discovery Grant RGPIN-2020-05665: Data Science on Blockchain and the National Science Foundation of USA under award number ECCS 2039701 Blockchain Graphs as Testbeds of Power Grid Resilience and Functionality Metrics.

Primary LanguagePythonMIT LicenseMIT


Chartalist

Please visit https://www.chartalist.org for more information.

Overview

Chartalist is the first blockchain machine learning ready dataset platform from unspent transaction output and account-based blockchains.

The Chartalist package contains:

  1. Dataloaders which automate and handle the download of datasets from a single package import and a simple two-argument function call.
  2. Ability to use the downloaded dataset directly after download as a Pandas DataFrame from the same two-argument function call.
  3. Graph makers for convenient generation of a NetworkX digraph from the network datasets.

Installation

  1. Download this repository and extract the contents to a desired location.
  2. Inside the chartalist_loader-main folder will serve as the working directory.

Requirements

Chartalist depends on the following:

  • networkx>=2.8.3
  • numpy>=1.22.3
  • outdated>=0.2.1
  • pandas>=1.4.2
  • patool>=1.12
  • requests>=2.27.1
  • setuptools>=60.2.0
  • torch>=1.11.0
  • torch_scatter>=2.0.9

Datasets

The following is a summary of the available datasets and their related tasks. Use the corresponding version argument when using Chartalist to retrieve the correct dataset of interest. Click on the dataset for more information.

Bitcoin ML-Ready Datasets

Dataset Features Version Constant
Ransomware Family: Bitcoinheist address, year, day, length, weight, count, looped, neighbors, income, label TYPE_PREDICTION
Bitcoin Transaction Network Input trans TRANSACTION_NETWORK_INPUT_SAMPLE
Bitcoin Transaction Network Output trans TRANSACTION_NETWORK_OUTPUT_SAMPLE
Bitcoin Block Times unix_time BLOCK_TIME
Bitcoin Price Data date, price, year, day, totaltx PRICE_PREDICTION

Ethereum ML-Ready Datasets

Dataset Features Version Constant
Ethereum Token Networks token_address, from_address, to_address, value, transaction_hash, log_index, block_number TYPE_PREDICTION_TRANSACTIONS
Ethereum Token Network Labels type, address, name TYPE_PREDICTION_LABELS
EtherDelta Ether-to-Token Transactions transaction_hash, block_number, timestamp, tokenGet, amountGet, tokenGive, amountGive, get, give ANOMALY_DETECTION_ETHER_DELTA_TRADES
IDEX Ether-to-Token Transactions transaction_hash, status, block_number, gas, gas_price, timestamp, amountBuy, amountSell, expires, nonce, amount, tradeNonce, feeMake, feeTake, tokenBuy, tokenSell, maker, taker ANOMALY_DETECTION_IDEX
Ether-to-Token Ether-Dollar Price Date(UTC), UnixTimeStamp, Value ANOMALY_DETECTION_ETHER_DOLLAR_PRICE
Bytom Network fromAddress, toAddress, time, amount MULTILAYER_BYTOM
Cybermiles Network fromAddress, toAddress, time, amount MULTILAYER_CYBERMILES
Decentraland Network fromAddress, toAddress, time, amount MULTILAYER_DECENTRALAND
Tierion Network fromAddress, toAddress, time, amount MULTILAYER_TIERION
Vechain Network fromAddress, toAddress, time, amount MULTILAYER_VECHAIN
ZRX Network fromAddress, toAddress, time, amount MULTILAYER_ZRX
Ethereum VeChain Token Transactions fromAddress, toAddress, time, amount PRICE_PREDICTION_VECHAIN
Ethereum ZRX Token Transactions fromAddress, toAddress, time, amount PRICE_PREDICTION_ZRX
Stablecoin ERC20 Transactions fromAddress, toAddress, time, amount STABLECOIN_ERC20

Dashcoin ML-Ready Datasets

Dataset Features Version Constant
Dashcoin Transaction Network Input trans TRANSACTION_NETWORK_INPUT_SAMPLE
Dashcoin Transaction Network Output trans TRANSACTION_NETWORK_OUTPUT_SAMPLE

Using Chartalist

  1. Navigate to the folder chartalist_loader-main and create a new .py script or add one which will serve as the working environment.
  2. Ensure to add import chartalist at the top of the script.
import chartalist 
  1. All datasets in Chartalist can be downloaded and referenced as a Pandas DataFrame in a single function call.

For example:

data = chartalist.get_dataset(dataset='dashcoin', version='chartalist.DashcoinLoader.TRANSACTION_NETWORK_OUTPUT_SAMPLE', download=True, data_frame=True)

There are currently three options for the dataset argument:

  • ethereum
  • bitcoin
  • dashcoin

Depending on the choice of the dataset argument, the version argument will take the following format:

For ethereum:

version=chartalist.EthereumLoader.

For bitcoin:

version=chartalist.BitcoinLoaders.

For dashcoin:

version=chartalist.DashcoinLoader.

Refer to #Datasets for the appropriate constant to append to the end of the version above and then the function is now ready to be used.

  1. Upon execution of the function, the corresponding dataset will be downloaded under the data folder in the working directory, if not already downloaded, when the script is executed and the Pandas DataFrame containing the dataset can be used directly for processing.

NOTE: Due the large nature of certain datasets, only sample data will be downloaded by the dataloader. If the complete dataset is required, click on the link corresponding to the dataset of interest and manually download the data from our website. Replace the contents of the sample dataset with the contents of the complete dataset under the data folder and proceed as normal.

Generating Networks

The Bitcoin and Dashcoin Transaction Network Input and Output datasets require the use of a Chartalist graph maker to be converted into a usable NetworkX digraph. See bitcoin_network_example.py or dashcoin_network_example.py for instructions.

For other Network datasets that have labels fromAddress, toAddress, and value labels such as the Ethereum Token Network dataset, the generation of a Networkx digraph can be done directly. See ethereum_network_example.py for instructions.

Parsing Datasets

Parsing any dataset for basic statistical information can be done so easily by using the Pandas Dataframe returned by the dataloader. See stablecoin_erc20_example.py for reference.

Address Exclusion

Please use our online tool to submit your request for removing an address from our dataset due to security and privacy issues.