The raw data for the blockchain is generated using two tools:
- bitcoind which downloads the blockchain and keeps it synced
- blockparser which dumps the blockchain data to CSV format
As of approximately 2014-10-12 I have
$ wc -l /data1/blockchain/* 317530 /data1/blockchain/blocks.csv 108235459 /data1/blockchain/inputs.csv 121222652 /data1/blockchain/outputs.csv 45348658 /data1/blockchain/transactions.csv 275,124,299 total
Sample data from the output of blockparser (indeed, the first few 1000 blocks & transactions) is included in the data directory. The following command:
$ head -n2 data/*.csv ==> data/blocks.csv <== ID,Hash,Version,Timestamp,Nonce,Difficulty,Merkle,NumTransactions,OutputValue,FeesValue,Size 0,"000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f",1,"2009-01-03T18:15:05Z",2083236893,1.000000,"4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b",1,5000000000,0,285 ==> data/inputs.csv <== TransactionId,Index,Script,OutputTxHash,OutputTxIndex 171,0,"01091d8d76a82122082246acbb6cc51c839d9012ddaca46048de07ca8eec221518200241cdb85fab4815c6c624d6e932774f3fdf5fa2a1d3a1614951afb83269e1454e2002443047","0437cd7f8525ceed2324359c2d0ba26006d92d856a9c20fa0241106ee5a597c9",0 ==> data/outputs.csv <== TransactionId,Index,Value,Script,ReceivingAddress,InputTxHash,InputTxIndex 0,0,5000000000,"ac5f1df16b2b704c8a578d0bbaf74d385cde12c11ee50455f3c438ef4c3fbcf649b6de611feae06279a60939e028a8d65c10b73071a6f16719274855feb0fd8a670441","1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa",, ==> data/transactions.csv <== ID,Hash,Version,BlockId,NumInputs,NumOutputs,OutputValue,FeesValue,LockTime,Size 0,"4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b",1,0,0,1,5000000000,0,3652501241,204
reveals the data model used by the blockchain.
Here is a simple diagram of these relationships:
/------- spent -------------\ | | v | _______ _____________ _________ |BLOCK| <--- included --- |TRANSACTION| --- output ---> |ADDRESS| ------- ------------- --------- | ^ | ^ | | | | /--/ \-------\ /--/ \-------\ | | | | \--- parent ---/ \--- input ----/
the data should be loaded in the following order:
- blocks
- transactions
- outputs
- inputs
See the TitanDB schema docs for details:
Vertex Label |
---|
block |
transaction |
address |
Edge Label | Source Vertex Label | Target Vertex Label | Multiplicity | Reasoning |
---|---|---|---|---|
parent | block | block | MANY2ONE | Each block only has one parent but a given block can have multiple blocks claiming to descend from it (a blockchain fork) |
included | transaction | block | MANY2ONE | Every transaction is included in one block (at least once it’s confirmed?)… |
input | transaction | transaction | MULTI | A transaction can be an input in only one other transaction…except when it can appear more than once! |
output | transaction | address | MULTI | Each transaction can have multiple output addresses and multiple transactions can have the same address as output |
spent | address | transaction | MULTI | Each address can only be spent in a single transaction…except when it can appear more than once! |
Property Key | Property Key Data Type | Property Key Cardinality | Vertex Labels | Edge Labels | Flags | Key |
---|---|---|---|---|---|---|
bkid | long | SINGLE | block | unique | composite | |
txid | long | SINGLE | transaction | unique | composite | |
hash | string | SINGLE | address | unique | composite | |
bkHash | string | SINGLE | block | unique | composite | |
txHash | string | SINGLE | transaction | unique | composite | |
version | string | SINGLE | block, transaction | composite | ||
outputValue | long | SINGLE | block, transaction | mixed | ||
feesValue | long | SINGLE | block, transaction | mixed | ||
size | long | SINGLE | block, transaction | mixed | ||
timestamp | long | SINGLE | block | mixed | ||
nonce | string | SINGLE | block | composite | ||
difficulty | float | SINGLE | block | mixed | ||
merkle | string | SINGLE | block | composite | ||
numTx | integer | SINGLE | block | precalc | mixed | |
numInputs | integer | SINGLE | transaction | precalc | mixed | |
numOutputs | integer | SINGLE | transaction | precalc | mixed | |
lockTime | long | SINGLE | transaction | mixed | ||
balance | long | SINGLE | address | precalc | mixed | |
index | integer | SINGLE | input, output | vertex | ||
script | string | SINGLE | input, output | composite, vertex | ||
outputTxIndex | integer | SINGLE | input, spent | vertex | ||
value | long | SINGLE | output | mixed, vertex | ||
rcvrAddressHash | string | SINGLE | output | composite, vertex |