/cardano-slurp

Uses the cardano mini-protocols to receive every block and transaction, and save them to a configurable destination

Primary LanguageRustMIT LicenseMIT

cardano-slurp

Connects to one or more cardano-node's, streams all available transactions, and saves them to disk (or to S3) in raw cbor format.

Usage

Aims to have sensible defaults; Running cardano-slurp without arguments will connect to an IOHK relay and save blocks to the blocks directory

cardano-slurp

You can specify custom values via command line or environment variable:

$ cardano-slurp --help

Connect to cardano nodes and download all blocks and transactions without processing them

Usage: cardano-slurp [OPTIONS]

Options:
  -r, --relay <RELAY>
          The cardano relay node to connect to [default: relays-new.cardano-mainnet.iohk.io:3001]
  -t, --topology-file <TOPOLOGY_FILE>
          A topology file to read for relays to connect to
  -f, --fallback-point <FALLBACK_POINT>
          
  -d, --directory <DIRECTORY>
          The directory to save blocks into [default: db]
      --testnet-magic <TESTNET_MAGIC>
          The network magic to use when communicating with nodes
  -h, --help
          Print help
  -V, --version
          Print version

cardano-slurp --relay relays.cardano-mainnet.iohk.io:3001 --directory db --fallback-point 78416/f85c52e97c6ec4e171d92789e32331e624ee7a0c7ba18b578062727edb7d61f7

RELAY=relays-new.cardano-mainnet.iohk.io:3001 cargo-slurp

Rather than specifying relays individually, you can specify a topology.json file in the same format that the cardano-node reads:

cardano-slurp --topology-file topology.json

Format

The file structure after running (assuming default parameters) should look like this:

 - db                    | Contains all persisted data
   - headers             | All downloaded headers
     - {large-bucket}    | See note on bucketing below
       - {small-bucket}  |
         - {slot}-{hash} | The header we observed at {slot} with the given {hash}; there may be multiples in the case of rollbacks or different blocks received from different relays
   - bodies              | All downloaded block bodies 
     - {large-bucket}    | See note on bucketing below
       - {small-bucket}  |
         - {slot}-{hash} | The block body we observed at {slot} with the given {hash}; there may be multiples in the case of rollbacks or different blocks received from different relays
   - cursors             | Cursors, tracking how far we've sync'd with any given relay
    - {relay}            | The cursor file, serialized as CBOR

NOTE: Common wisdom seems to indicate that you should keep directories to around 10k entries so as not to destroy performance of directory scan operations. Thus, we introduce two layers of nesting, called buckets, to occasionally roll over to an empty directory and keep the sizes small. Each bucket represents the starting slot of a range which contains all the blocks in that subdirectory. The large bucket rolls over ever 20 million slots, and the small bucket rolls over every 200 thousand slots. This ensures that each large-bucket directory has no more than 1000 entries, and each small-bucket directory has no more than 10,000 entries. One large-bucket represnets roughly 230 days of blocks in the shelley era.