/ethdecoder

This CLI decodes Ethereum transactions and events, stores results in CouchDB, and then exports customized views to CSV files for data visualization and analysis.

Primary LanguageJavaScript

Ethereum Data Decoder

This Node.js application retrieves Ethereum data from BigQuery public dataset, decodes the transactions and events, and then writes the results in a CouchDB database.

Prerequisite

Install Node.js

On my Mac, I used nvm to install Node.js as follows:

touch ~/.zshrc
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install node

Create BigQuery Project

Sign in to Google Platform and create a BigQuery project via link.

Create a service account and download the account key as described here. Store the downloaded file in the config folder. Edit the config.json to set bqCredential to this key file.

Create Etherscan API Key

Sign in to Etherscan, and create an API Key via link. Edit the config.json to set etherscanApiKey to this key. We call Etherscan API to fetch contract ABIs.

Install CouchDB

Follow the instruction to install Apache CouchDB. View the databases by using the Fauxton UI at http://127.0.0.1:5984/_utils/.

Create a user and a database used to store decoded Ethereum transactions and events by using the following script, e.g.,

curl -X PUT -u admin:password http://127.0.0.1:5984/ethdb
curl -X PUT -u admin:password http://localhost:5984/_users/org.couchdb.user:ethadmin -H "Accept: application/json" -H "Content-Type: application/json" -d '{"name": "ethadmin", "password": "ethadmin", "roles": ["ethdb_admin"], "type": "user"}'
curl -X PUT -u admin:password http://localhost:5984/_users/org.couchdb.user:ethuser -H "Accept: application/json" -H "Content-Type: application/json" -d '{"name": "ethuser", "password": "ethuser", "roles": ["ethdb_member"], "type": "user"}'
curl -X PUT -u admin:password http://localhost:5984/ethdb/_security -H "Content-Type: application/json" -d '{"admins": { "names": [], "roles": ["_admin", "ethdb_admin"] }, "members": { "names": [], "roles": ["_admin", "ethdb_admin", "ethdb_member"] } }'

This example creates a database ethdb and a user ethuser for storing the application output. The database connection info must be configured in config.json.

Configuration

Besides system connection data as described in the previous section, the config.json can provide the following attributes to customize the behavior of the decoder:

  • standardAbis is a list of standard token ABI files, e.g., erc20.json or erc721.json, etc, which are used to decode transaction inputs and event data if the required contract ABI cannot be fetched from Etherscan.
  • tokenInfo is a file that lists metadata of token contracts that are not available in on-chain blocks. The metadata specifies the symbol, name, and decimals of token contracts.
  • contractAbis is a folder that contains ABI files for contracts that does not provide verified ABI on the Etherscan. The ABI files in this folder must be named after the corresponding contract address, e.g., 0x6b175474e89094c44da98b954eedeac495271d0f.json is the ABI file for the DAI token.

Before start collecting Ethereum data, you must create at least the contract and transaction views in the CouchDB as described in views/README.md.

Run

The command line interface supports the following functions:

  • decode command fetches Ethereum transactions and events from BigQuery public data store, decodes them and then writes the result to a CouchDB database. It also builds a reference cache of contract ABIs by using Etherscan API calls.
  • update command updates the contract cache in batches by fetching metadata and related token info from BigQuery public data store.
  • export command queries a CouchDB view, and format the result as a CSV file, so it can be imported to a data analytics tool.

Decode

Run the following command to decode Ethereum transactions and events from a specified contract during a specified date range:

node index.js decode contract-address [ start-date [ end-date ]]

The contract-address may be a comma-delimited list of contract addresses.

For example, the following command would decode the data for the DAI token contract in the date range from 2021-10-01 and 2021-10-05 (inclusive):

node index.js decode '0x6b175474e89094c44da98b954eedeac495271d0f' '2021-10-01' '2021-10-03'

If the start-date and/or end-date is not specified, the default date would be the date before the system date, i.e., yesterday.

Update

Run the following command to update cached contracts created during the decode process:

node index.js update

Export

The following command fetches data from a CouchDB view by specified aggregation level:

node index.js export ddoc view params [output [options]]

where ddoc and view specifies the name of the CouchDB design document and the name of the view defined in the ddoc; params specifies the query parameters as defined by the CouchDB API; output is the name of the output report file. The result will be printed on stdout if the file name is not specified.

By default, the reported token counts will be in fractional units defined by the decimals of corresponding tokens, e.g., USDC is in unit of 10-6, while DAI is in unit of 10-18. If you want to report tokens with normalized counts, you can use the parameter options to specify the token key column corresponding to the reported value of a token count. For example, options = {value: 0} would mean that the first key column is the token address that will be used to normalize the value column that contains the corresponding token count.

If you have collected some Uniswap v2 data, and defined the view in uniswap-v2.json, you can execute a query on the view by using the following command:

node index.js export 'uniswap-v2' 'swap-token-out' '{"group_level": 5, "limit": 20}' './report.csv'

It will generate a comma-delimited file report.csv containing the first 20 rows of the aggregated view. This view reports transactions that swaps pairs of tokens, including swapped amount of tokens as amountIn and amountOut. To report only tokens with known token decimals, you can specify the options as:

node index.js export 'uniswap-v2' 'swap-token-out' '{"group_level": 5, "limit": 200}' './report.csv' '{"amountIn": 0, "amountOut": 1}'

In this report, the input token count amountIn will be normalized by using the token address in the first key column, and the output token count amountOut will be normalized by using the token address in the second key column.

The options argument may also specify one or more value filters to reduce the size of the resulting output file. The filters specify functions that evaluate attributes of the view's value object and return true for exported rows. For example, following option will export only values with count > 2:

{ "$filter": "x => x.count > 2" }

See example.sh for a sample command that uses options for both amount normalization and value filters.

You can also specify key-filters to filter exported rows by keys of a view, e.g.,

{ "$keyFilter": "x => x[0] === '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'" }

The above key filter would export only rows whows first key element is the specified contract address. If you want to specify multiple filters, you can specify $filter and/or $keyFilter as an array of strings, and each string defines a filter function for the view.

Note that you may preview the result of a query by using the CouchDB Fauxton UI before you execute the export command.

Data in CouchDB

The decoded data are stored in the CouchDB as JSON documents. Each document is tagged with a docType of contract, transaction, or event, depending on the corresponding content type.

By creating CouchDB views, the collected data of transactions and events can be indexed and aggregated by map and reduce on CouchDB server. Some sample views are described in views.