gnolang/bounties

Bounty 7 - Proposal

moul opened this issue · 8 comments

moul commented

Plan

I've started working on a tool that will:

  • connects to a running Gaia's RPC server
  • iterates over the range of a specified block
  • apply filters/rules that compute events
  • a binary that allows calling the library with some filters directly configurable using flags (more a demo than really useful)
  • use filters to compute a score and other metadata per account
  • bundled with common filters/rules as templates/helpers
  • display/export the data in a usable way (csv, genesis, ...)
  • tested (not sure yet how it can be done in a nice way)
  • well structured to be easily readable/extendable/reusable library
  • comments, code examples, CI/CD

Current Status

  • Exploration mode (all-in-one file)
  • What it does "well" -> processing blocks, txs, events with Go
  • What is not yet implemented -> score computing; codebase structure enhancements

Repo: https://github.com/moul/cosmos-snapshot

$> go run -v . -h
Usage of cosmos-snapshot:
  -debug
        verbose output
  -max-height int
        last block to process (default 5797010)
  -min-height int
        first block to process (default 5200791)
  -rpc-addr string
        Cosmos RPC Address (default "http://localhost:26657")

go run -v . --min-height=5200800 --max-height=5200900 --debug

Screenshot 2022-04-08 at 02 40 44


go run -v . --min-height=5200800 --max-height=5200900 (without --debug, you just get a progress bar)

Screenshot 2022-04-08 at 02 38 50

Questions

  • @jaekwon Do you think I'm on the right track and can continue, or am I completely off track?
  • some filters in the brief are not clear, I'll need some help to complete my list of filters when the project will be mostly finished

connects to a running Gaia's RPC server

Thinking about the overhead of RPC here, I wonder if this would work fast enough on localhost via the loopback or a unix socket. Worst case we can figure out a way to avoid it, so this seems like a fine way to start, especially if websockets can be used to make and accept a stream of responses, the overhead of HTTP might be significant.

It's fine to start, but it needs to be fast, so should explore websocket RPC calls, or use a local RPC client if possible. I'll take a look at it today to look at options.

iterates over the range of a specified block

Some transactions will pass the ante-handler (pay enough fees), but still fail with a non-zero error code. The error code for transactions are in the next block and can also be retrieved via RPC (if necessary, or via go access). If the error code is no-zero (error), it usually means the transaction fee was paid, but the transaction had no effect.

apply filters/rules that compute a score and other metadata per account

thinking through what I want, and the reality of what it takes to implement it... the accounting per transaction message is already implemented in go as run by the cosmos hub, and implementing a second mechanism of accounting with modifications, is going to be as difficult as implementing the first accounting system (cosmoshub-4 as is). This would be doubly difficult having to do it over RPC as opposed to the written logic of accounting already written without RPC (i.e. sdk/x/bank, sdk/x/auth, sdk/x/staking etc).


TL;DR, I think we want to start with a pure current snapshot of the account state, and apply any changes post-facto.

  • part 1: given a block height (the latest blockheight that you know), export the current account state. I believe there is already some feature written to export state (and side note, it probably doesn't use RPC)--used to generate the genesis.json from cosmoshub-3. Test it out at a recent block height. Check it against production data; for example, does it show the amount of tokens held in IBC channels to osmosis? Upload the snapshot to S3 or some other file storage provider.

  • part 2: given an account A1 at a given block height in the past T1, and current block time T2, create a list of where all the account tokens are now, as a list of {Account;Coins} tuples. So if there were no transactions signed by A1 between T1 and T2, (and no unbondings before T1), the result would be simply [{A1;C1}] where C1 are the coins held by A1 at time T1. Implementation of this feature would start with SendTx, and then become staking aware. I don't know how best to do that off the top of my head. This also probably shouldn't use RPC but instead use go functions to iterate over blocks to avoid RPC overhead. That said, I might be wrong... if the RPC can handle say a month's worth of cosmoshub-4 blocks through localhost RPC in an hour, then it's fine. This might be feasible with unix pipes, or websockets.

  • part 3: given a proposal, find all accounts that voted for/against/abstain, but also accounting for overrides by time (changing votes) and by delegation (overriding the validator's vote by a delegator).


The new goal of #7 is to exercise and code for the GNO snapshot. Bounty #8 is on hold until we finish #7 and launch gno.land. Thank you for your initial submissions -- we will include that work in the payout. See three parts to this bounty above.

Also, I'm going to move this bounty system to gno.land at some point, so please register a name (must be 7 letters or more) to gno.land/r/users. Once we have this live chain, we will be earning GNOTs directly from the chain.

I'm thinking that those of us who want to work continuously on gno.land should be paid continuously in gno. /r/boards and /r/users etc will become a group collaboration software.

part 1: given a block height (the latest blockheight that you know), export the current account state. I believe there is already some feature written to export state (and side note, it probably doesn't use RPC)--used to generate the genesis.json from cosmoshub-3. Test it out at a recent block height. Check it against production data; for example, does it show the amount of tokens held in IBC channels to osmosis? Upload the snapshot to S3 or some other file storage provider.

Yes, there's a command for that, but It takes so long to run, and It does show all the ibc token.

if the RPC can handle say a month's worth of cosmoshub-4 blocks through localhost RPC in an hour, then it's fine.

with async, we can do that, I think. Async x many times the speed of query.

Some transactions will pass the ante-handler (pay enough fees), but still fail with a non-zero error code. The error code for transactions are in the next block and can also be retrieved via RPC (if necessary, or via go access). If the error code is no-zero (error), it usually means the transaction fee was paid, but the transaction had no effect.

Ohh I haven't thought about this. Tho, I'd read through @moul code, he uses tx's events which effectively solves this problem

Also for part 2 we can use tx query instead of looping over block

moul commented

Thinking about the overhead of RPC here, I wonder if this would work fast enough on localhost via the loopback or a unix socket. Worst case we can figure out a way to avoid it, so this seems like a fine way to start, especially if websockets can be used to make and accept a stream of responses, the overhead of HTTP might be significant.
It's fine to start, but it needs to be fast, so should explore websocket RPC calls, or use a local RPC client if possible. I'll take a look at it today to look at options.

I just merged my refactoring PR, see https://github.com/moul/cosmos-snapshot

Now, we have a dedicated package to walk the chain that currently uses RPC (w/ WebSocket). It implements an interface that we can reimplement using a file-system approach, or a hybrid one.
This package is agnostic and focuses on walking the chain and call a callback with everything filled.


thinking through what I want, and the reality of what it takes to implement it... the accounting per transaction message is already implemented in go as run by the cosmos hub, and implementing a second mechanism of accounting with modifications, is going to be as difficult as implementing the first accounting system (cosmoshub-4 as is). This would be doubly difficult having to do it over RPC as opposed to the written logic of accounting already written without RPC (i.e. sdk/x/bank, sdk/x/auth, sdk/x/staking etc).

I moved the "brain" which I temporarily named Accountant into a dedicated rules.go file that is responsible to keep a data struct and compute received events before displaying the results.
I consider this file the best place to write our "configuration", and I suggest not trying to find a configuration file format but instead using pure go to be more permissive in terms of rules.
I think that we can create various helpers to keep this part of the code super clear.

This lets the main dedicated to gluing things together.


Some transactions will pass the ante-handler (pay enough fees), but still, fail with a non-zero error code. The error code for transactions is in the next block and can also be retrieved via RPC (if necessary, or via go access). If the error code is no-zero (error), it usually means the transaction fee was paid, but the transaction had no effect.

I fetch the Block TXs, but also the BlockResults' BeginBlockEvent EndBlockEvent. I think that I've everything needed in terms of the data source to manage cross-tx events. Check out the switch event.Type lines in rules.go that contains all the event types I found (https://github.com/moul/cosmos-snapshot/blob/main/cmd/snapshot-example/rules.go#L81-L161=)


Also, I'm going to move this bounty system to gno.land at some point, so please register a name (must be 7 letters or more) to gno.land/r/users. Once we have this live chain, we will be earning GNOTs directly from the chain.

What do you think if I open a PR with my existing codebase against the main repo in a subdirectory like x/genesis?

I also prefer to centralize the go code in the main repo, but I have a concern if we want to allow more external users (like @catShaark) to collaborate; we should then merge the unfinished (but compiling) code so we can iterate OR maybe use GitHub permissions to allow more people to work on the same PR; both solutions are probably problematic, let us know :)

(✅ I've registered my nickname -> manfred)


I'm thinking that those of us who want to work continuously on gno.land should be paid continuously in gno. /r/boards and /r/users etc will become a group collaboration software.

I can't give promises in terms of regularity, but I'd love to work continuously on gno.land!

This looks well structured. I will double check but I don't think it's actually using websockets, but rather using HTTP requests. For each transaction it appears to be making a HTTP requests to get the events as stored on disk (I think). I suspect running this sort of loop for all transactions will take a while. Can you estimate? It will probably be best to use a unix socket file if possible.

moul commented

Current perfs on my dev machine which is also doing some workloads:

Run 1 (just txs, no BeginBlock nor EndBlock

10k blocks, 35k txs, 2m42s

$> go run -v ./cmd/snapshot-example/ --debug --min-height=6620000 --max-height=6630000
[…]
Stats:
{
  "StartedAt": "2022-04-12T20:55:11.724527158Z",
  "Duration": "2m42.598269213s",
  "TotalCalls": 45252,
  "TotalByKind": {
    "block": 10001,
    "tx": 35251
  },
  "TotalByEventKind": {
    "tx:acknowledge_packet": 6,
    "tx:channel_open_ack": 1,
    "tx:channel_open_init": 1,                                                                                   
    "tx:connection_open_ack": 1,                                                                                 
    "tx:connection_open_init": 2,                                                                                
    "tx:create_client": 3,                                                                                       
    "tx:delegate": 1272,                                                                                         
    "tx:denomination_trace": 4,                                                                                  
    "tx:fungible_token_packet": 17,                                                                              
    "tx:ibc_transfer": 37,
    "tx:message": 21766,
    "tx:proposal_vote": 67,
    "tx:recv_packet": 5,
    "tx:redelegate": 19,
    "tx:send_packet": 37,
    "tx:timeout": 1,                                    
    "tx:timeout_packet": 1,
    "tx:transfer": 10186,
    "tx:unbond": 169,
    "tx:update_client": 21,
    "tx:withdraw_commission": 17,
    "tx:withdraw_rewards": 1613,
    "tx:write_acknowledgement": 5
  }
}

Run 2

10k heights, 35k txs, 2.5M begin block events, 237 end block events, 10minute

$> go run -v ./cmd/snapshot-example/ --debug --min-height=6620000 --max-height=6630000 —with-block-results
[…]
Stats:
{
  "StartedAt": "2022-04-12T20:57:55.569189605Z",
  "Duration": "10m0.606747628s",
  "TotalCalls": 2631303,
  "TotalByKind": {
    "bbegin": 2585814,
    "bend": 237,
    "block": 10001,
    "tx": 35251
  },
  "TotalByEventKind": {
    "bbegin:commission": 1260126,
    "bbegin:liveness": 5556,
    "bbegin:message": 20002,
    "bbegin:mint": 10001,
    "bbegin:proposer_reward": 10001,
    "bbegin:rewards": 1260126,
    "bbegin:transfer": 20002,
    "bend:complete_redelegation": 28,
    "bend:complete_unbonding": 207,
    "bend:message": 1,
    "bend:transfer": 1,
    "tx:acknowledge_packet": 6,
    "tx:channel_open_ack": 1,
    "tx:channel_open_init": 1,
    "tx:connection_open_ack": 1,
    "tx:connection_open_init": 2,
    "tx:create_client": 3,
    "tx:delegate": 1272,
    "tx:denomination_trace": 4,
    "tx:fungible_token_packet": 17,
    "tx:ibc_transfer": 37,
    "tx:message": 21766,
    "tx:proposal_vote": 67,
    "tx:recv_packet": 5,
    "tx:redelegate": 19,
    "tx:send_packet": 37,
    "tx:timeout": 1,
    "tx:timeout_packet": 1,
    "tx:transfer": 10186,
    "tx:unbond": 169,
    "tx:update_client": 21,
    "tx:withdraw_commission": 17,
    "tx:withdraw_rewards": 1613,
    "tx:write_acknowledgement": 5
  }
}

—-

So yes, it’s pretty slow by default

We can try various things like:

  • using unix sockets instead of TCP
  • using queues/chans to always call the next block without waiting for the local processing
  • using an async worker pool
  • using multiples gaiad nodes instead of just one
  • switching to a file parser
  • bigger hardware