Various code for TON blockchain analysis.
Initial blockchain data extraction is performed with ton-indexer. PostgreSQL is not good enough for huge analytical workload so data loaded into S3-based datalake. Airflow is used for incremental ETL process:
- E: extract data from psql sliced by date intervals
- T: convert results to parquet file format
- L: load it into S3 object storage
Datalake tables:
Table | Prefix |
---|---|
accounts | dwh/staging/accounts/date=YYYYMM/ |
transactions | dwh/staging/transactions/date=YYYYMM/ |
messages | dwh/staging/messages/date=YYYYMM/ |
After each incremental upload log entry created on psql table ``increment_state```
Smart-contract data/messages parser for extracting meaningful data, for example, NFT owners. Example:
POSTGRES_URI=postgres://postgres:pass@localhost:5432/ton_index \
python3 contracts_parser.py -c jetton -d cache/ -e http://localhost:9090/execute \
jetton_contract_data.csv jetton_parsed.csv
Supported contract types:
- contracts_executor - wrapper around ton-contract-executor for executing arbitrary get methods on contracts providing code and data cells. Used in parser to extract information.
Some analysis scripts:
- TON_events_plot.ipynb - plot gantt chart visualising TON scam (for this research)