axonweb3/axon-devops

Unable to run Axon in a local Kubernetes cluster

Closed this issue · 12 comments

I'm trying to start k8s-deploy/k8s/axon/deploy.sh in a local docker-desktop cluster.

Having the following changes:

  • create PersistentVolumes to satisfy PersistentVolumeClaims, stored in /Users/serejke/.axon/node-1, /Users/serejke/.axon/node-2 etc
  • disable Ingress — not needed
  • remove nodeSelector: disktype: node4 to make my local Kubernetes nodes compatible
  • make replicate = 1 to simplify debugging

My 4 axon nodes run for a couple of seconds and fail with the following log:

[2023-04-03T18:27:56.212491963+00:00 INFO core_executor::system_contract] execute addr 0xb00d…c15a
[2023-04-03T18:27:56.223528129+00:00 INFO core_executor::system_contract] execute addr 0x4af5…2352
[2023-04-03T18:27:56.229972796+00:00 INFO core_run] Execute the genesis distribute success, genesis state root 0xd01bf2694feaaea8a7d6ee62f4c27c143b41042bdca1eca18e84cb7d2e55f10c, response ExecResp { state_root: 0xd01bf2694feaaea8a7d6ee62f4c27c143b41042bdca1eca18e84cb7d2e55f10c, receipt_root: 0x0378c03f0ac30062de319246880360934dbb384835cf0a6726b9af00dee2b92a, gas_used: 9509750, tx_resp: [TxResp { exit_reason: Succeed(Returned), ret: [], gas_used: 2081462, remain_gas: 27918538, fee_cost: 0, logs: [], code_address: Some(0xc2fd48d60ae16b3fe6e333a9a13763691970d9373d4fab7cc323d7ba06fa9986), removed: false }, TxResp { exit_reason: Succeed(Returned), ret: [], gas_used: 2422845, remain_gas: 215571183603, fee_cost: 0, logs: [Log { address: 0x4af5ec5e3d29d9ddd7f4bf91a022131c41b72352, topics: [0x8be0079c531659141344cd1fd0a4f28419497f9722a3daafe3b4186f6b6457e0, 0x0000000000000000000000000000000000000000000000000000000000000000, 0x0000000000000000000000008ab0cf264df99d83525e9e11c7e4db01558ae1b1], data: [] }, Log { address: 0x4af5ec5e3d29d9ddd7f4bf91a022131c41b72352, topics: [0x2f8788117e7eff1d82e926ec794901d17c78024a50270940304540a733656f0d, 0x0000000000000000000000000000000000000000000000000000000000000000, 0x0000000000000000000000008ab0cf264df99d83525e9e11c7e4db01558ae1b1, 0x0000000000000000000000008ab0cf264df99d83525e9e11c7e4db01558ae1b1], data: [] }], code_address: Some(0x336c11f92895e657a26642914af5ec5e3d29d9ddd7f4bf91a022131c41b72352), removed: false }, TxResp { exit_reason: Succeed(Returned), ret: [], gas_used: 269886, remain_gas: 215573336562, fee_cost: 0, logs: [Log { address: 0xb00d616b820c39619ee29e5144d0226cf8b5c15a, topics: [0xbc7cd75a20ee27fd9adebab32041f755214dbc6bffa90cc0225b39da2e5c2d3b, 0x000000000000000000000000a13763691970d9373d4fab7cc323d7ba06fa9986], data: [] }], code_address: Some(0xb233fb175c5be87ff90fc88eb00d616b820c39619ee29e5144d0226cf8b5c15a), removed: false }, TxResp { exit_reason: Succeed(Returned), ret: [], gas_used: 3386908, remain_gas: 26613092, fee_cost: 0, logs: [], code_address: Some(0x2c3a9349df5b162519b17621f67bc4e50d1df92b0e4c61794a4517af6a995cb2), removed: false }, TxResp { exit_reason: Succeed(Returned), ret: [], gas_used: 799309, remain_gas: 29195891, fee_cost: 0, logs: [], code_address: None, removed: false }, TxResp { exit_reason: Succeed(Returned), ret: [], gas_used: 497334, remain_gas: 29502666, fee_cost: 0, logs: [Log { address: 0xb484fd480e598621638f380f404697cd9f58b0f8, topics: [0xbc7cd75a20ee27fd9adebab32041f755214dbc6bffa90cc0225b39da2e5c2d3b, 0x000000000000000000000000f67bc4e50d1df92b0e4c61794a4517af6a995cb2], data: [] }], code_address: Some(0xda6db70ce66da4c6433bb447b484fd480e598621638f380f404697cd9f58b0f8), removed: false }, TxResp { exit_reason: Succeed(Stopped), ret: [], gas_used: 52006, remain_gas: 29947994, fee_cost: 0, logs: [Log { address: 0x4af5ec5e3d29d9ddd7f4bf91a022131c41b72352, topics: [0x2f8788117e7eff1d82e926ec794901d17c78024a50270940304540a733656f0d, 0x241ecf16d79d0f8dbfb92cbc07fe17840425976cf0667f022fe9877caa831b08, 0x000000000000000000000000b484fd480e598621638f380f404697cd9f58b0f8, 0x0000000000000000000000008ab0cf264df99d83525e9e11c7e4db01558ae1b1], data: [] }], code_address: None, removed: false }] }
[2023-04-03T18:27:56.246767713+00:00 INFO core_run] The genesis block is created Block { header: Header { prev_hash: 0x0000000000000000000000000000000000000000000000000000000000000000, proposer: 0x0000000000000000000000000000000000000000, state_root: 0xd01bf2694feaaea8a7d6ee62f4c27c143b41042bdca1eca18e84cb7d2e55f10c, transactions_root: 0x0000000000000000000000000000000000000000000000000000000000000000, signed_txs_hash: 0x0000000000000000000000000000000000000000000000000000000000000000, receipts_root: 0x0000000000000000000000000000000000000000000000000000000000000000, log_bloom: 0xdifficulty: 0, timestamp: 1639459018, number: 0, gas_used: 0, gas_limit: 0, extra_data: b"", mixed_hash: None, nonce: 0x0000000000000000, base_fee_per_gas: 1337, proof: Proof { number: 0, round: 0, block_hash: 0x0000000000000000000000000000000000000000000000000000000000000000, signature: b"", bitmap: b"" }, last_checkpoint_block_hash: 0x0000000000000000000000000000000000000000000000000000000000000000, call_system_script_count: 0, chain_id: 10012 }, tx_hashes: [0x3bbe1ebf56b864d91ff5d7505be6df8a13a232a3c5969b30ad5fd254226c6e6b, 0x01240fb109c0c9ca0c095542d04140cc00d13bb66dd262ec088ba1b27424c8ac, 0xdf58cdda98ae3139026750bda1e3100442b59f91e26b6adac5749e3b026219ef, 0xfcbd67037cb8789fcb215cabed5e60a66afec59698a36d2afaa0cda626d66f07, 0x41bdc59db755cd3da3f41c2fdaf936e16d16130b202a8fd3c608c06b14d243ce, 0xca338c9e4eb563817bc363a243602b30c5a9608d94e3a04f948f60d40fc127f8, 0x378803d6f9517956f38e67c773956cf646625775207080b237e82334cbebcdb2] }
[2023-04-03T18:27:56.690430088+00:00 INFO core_run] prometheus start
[2023-04-03T18:27:56.690726463+00:00 INFO core_run] node starts
[2023-04-03T18:27:56.690817963+00:00 INFO core_run] Data path for block: "./devtools/chain/data1/rocksdb/block_data"
[2023-04-03T18:27:58.731982005+00:00 INFO core_run] Recover 0 tx of number 1 from wal
[2023-04-03T18:28:02.492455799+00:00 INFO core_run] The Genesis block has been initialized.
[2023-04-03T18:28:02.869677841+00:00 INFO core_run] prometheus start
[2023-04-03T18:28:02.869816716+00:00 INFO core_run] node starts
[2023-04-03T18:28:02.869895674+00:00 INFO core_run] Data path for block: "./devtools/chain/data1/rocksdb/block_data"
[2023-04-03T18:28:10.945044678+00:00 INFO core_run] Recover 0 tx of number 1 from wal
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ProtocolError { kind: Executor, error: FutureEpoch }', core/cli/src/lib.rs:56:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

where the ProtocolError { kind: Executor, error: FutureEpoch } seems to be the root cause.

This might be a misconfiguration issue on my side, even though I have verified that the created PersistentVolumes are bound to the claims, and that axon nodes create some internal files for RocksDB.

Will greatly appreciate your help in debugging this issue, as we need to be able to start the Axon nodes locally, to have full development environment similar to the production (AWS) cluster.

Working on it.

Would you want to store the data/log on your local?@serejke

Hey @liya2017, yes, I can allocate a temp directory on local to store all Axon nodes' data. For now this is my /Users/serejke/.axon/ but it may be configurable

Hi @serejke please change the file follow this commit and try it again.
08f2676

Just friendly reminder, maybe you known it:

change the path to your path

path: "/home/ec2-user/axon/node-1"

change the value to your node label

@liya2017 so your change was to use local PV instead of hostPath? I did like this:

I replaced:

    hostPath:
        path: "/Users/serejke/.axon/node-1"

with

    local:
        path: "/Users/serejke/.axon/node-1"
    nodeAffinity:
        required:
            nodeSelectorTerms:
                - matchExpressions:
                      - key: kubernetes.io/hostname
                        operator: In
                        values:
                            - docker-desktop

I use docker-desktop for Mac.

The same issue.

@liya2017 I built an image for this release v0.1.0-alpha.5 published at February 7 (commit hash 2684f2d3), and it just works with the same configuration

I will try with the latest release v0.1.0-alpha.8
and post results here

something has changed since February 7. It may be just a configuration issue: axon-devops might be a bit out-of-date

and it just works with the same configuration

Great. Yes, it’s my problem,sorry.

BTW,why don’t you use the docker-deploy if you in docker mode? We have docker-deploy directory in the repo. I thought you were using k8s, so I update it in k8s mode😀

@liya2017 I use Docker Desktop (not docker-deploy) - this is Docker for MacOS

and it creates a docker-desktop Kubernetes cluster. So I tried to deploy to k8s ;-) Sorry for confusing you

Ah, sorry to misunderstand and thanks for your sharing, will learn and try it in my side.
Could we close this issue?

@liya2017 I just built an M1 image (and created a task) for the most recent release v0.1.0-alpha.8 and it also works as expected.

The initial problem was with the image built for axon's main branch - that is, the most recent code.

I'd like to ask you if there is any important difference between axon's main and release branches (marked with tags), which might lead to the above problem? Otherwise it might be a bug introduced in the past 2 weeks, and we'd need to escalate this issue to the axon developers. Unfortunately, I'm not experienced with Axon node monitoring/debugging yet and can't provide more details for investigation. Thanks

@serejke The main branch included this pr axonweb3/axon#1115. If you using the main branch to build the image , you should update the config.toml and genesis.json, also need to clean the data since the genesis.json changed.

@liya2017 I see! Thanks so much. This issue may be closed then