monogon-dev/monogon

//metropolis/node/core/consensus flake

Closed this issue · 1 comments

lorenz commented
==================== Test output for //metropolis/node/core/consensus:consensus_test:
TT|          test launch ! Test: metrics-reachable: starting...
TT|          test launch ! Test: metrics-reachable: okay after 1.0 seconds
2023/07/31 14:53:53 supervisor.TestHarness: Waiting for supervisor runnables to die...
2023/07/31 14:53:54 supervisor.TestHarness: All done.
--- FAIL: TestEtcdMetrics (2.01s)
    asm_amd64.s:1571: supervisor                       I0731 14:53:52.381318 supervisor_processor.go:59] supervisor processor started
    asm_amd64.s:1571: root                             I0731 14:53:52.381353 supervisor_testhelpers.go:64] Starting test TestEtcdMetrics...
    asm_amd64.s:1571: root                             I0731 14:53:52.381443 consensus.go:237] PKI data absent, bootstrapping.
    asm_amd64.s:1571: root                             I0731 14:53:52.381457 consensus.go:389] Bootstrapping PKI: starting etcd...
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.381000 etcd.go:131] configuring peer listeners, listen-peer-urls: ["http://127.0.0.1:2345/"]
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.381000 etcd.go:139] configuring client listeners, listen-client-urls: ["unix:///tmp/metropolis-consensus-test2392152209//ephemeral/consensus/client.sock:0"]
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.382000 etcd.go:308] starting an etcd server, advertise-client-urls: ["http://localhost:2379/"], auto-compaction-interval: "0s", auto-compaction-mode: "", auto-compaction-retention: "0s", corrupt-check-time-interval: "0s", cors: ["*"], data-dir: "/tmp/metropolis-consensus-test2392152209//data/etcd/data", discovery-proxy: "", discovery-url: "", downgrade-check-interval: "5s", election-timeout: "1s", etcd-version: "3.5.4", force-new-cluster: false, git-sha: "Not provided (use ./build instead of go build)", go-arch: "amd64", go-os: "linux", go-version: "go1.18.10", heartbeat-interval: "100ms", host-whitelist: ["*"], initial-advertise-peer-urls: ["http://127.0.0.1:2345/"], initial-cluster: "metropolis-fca6ddcfdda7fbc264a3afe821ca4243=http://127.0.0.1:2345/", initial-cluster-state: "new", initial-cluster-token: "METROPOLIS", initial-corrupt-check: false, initial-election-tick-advance: true, listen-client-urls: ["unix:///tmp/metropolis-consensus-test2392152209//ephemeral/consensus/client.sock:0"], listen-metrics-urls: ["http://127.0.0.1:4100/"], listen-peer-urls: ["http://127.0.0.1:2345/"], max-cpu-available: 64, max-cpu-set: 64, member-dir: "/tmp/metropolis-consensus-test2392152209/data/etcd/data/member", member-initialized: false, name: "metropolis-fca6ddcfdda7fbc264a3afe821ca4243", pre-vote: true, quota-size-bytes: 2147483648, snapshot-catchup-entries: 5000, snapshot-count: 100000, wal-dir: "", wal-dir-dedicated: ""
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.388000 backend.go:81] opened backend db, path: "/tmp/metropolis-consensus-test2392152209/data/etcd/data/member/snap/db", took: "6.433012ms"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.391000 raft.go:448] starting local member, cluster-id: "fcb2b82bb878a292", local-member-id: "9651b3d87a9444db"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.391000 zap_raft.go:77] 9651b3d87a9444db switched to configuration voters=(), logger: "raft"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.391000 zap_raft.go:77] 9651b3d87a9444db became follower at term 0, logger: "raft"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.391000 zap_raft.go:77] newRaft 9651b3d87a9444db [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0], logger: "raft"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.391000 zap_raft.go:77] 9651b3d87a9444db became follower at term 1, logger: "raft"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.391000 zap_raft.go:77] 9651b3d87a9444db switched to configuration voters=(10831636321153598683), logger: "raft"
    asm_amd64.s:1571: root.etcd                        W0731 14:53:52.392000 store.go:1220] simple token is not cryptographically signed
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.392000 kvstore.go:415] kvstore restored, current-rev: 1
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.392000 quota.go:94] enabled backend quota with default value, quota-name: "v3-applier", quota-size: "2.1 GB", quota-size-bytes: 2147483648
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.393000 server.go:851] starting etcd server, cluster-version: "to_be_decided", local-member-id: "9651b3d87a9444db", local-server-version: "3.5.4"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.393000 server.go:736] started as single-node; fast-forwarding election ticks, election-ticks: 10, election-timeout: "1s", forward-duration: "900ms", forward-ticks: 9, local-member-id: "9651b3d87a9444db"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.393000 zap_raft.go:77] 9651b3d87a9444db switched to configuration voters=(10831636321153598683), logger: "raft"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.393000 cluster.go:421] added member, added-peer-id: "9651b3d87a9444db", added-peer-peer-urls: ["http://127.0.0.1:2345/"], cluster-id: "fcb2b82bb878a292", local-member-id: "9651b3d87a9444db"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.396000 etcd.go:582] serving peer traffic, address: "127.0.0.1:2345"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.396000 etcd.go:554] cmux::serve, address: "127.0.0.1:2345"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.396000 etcd.go:764] serving metrics, address: "http://127.0.0.1:4100/"
    asm_amd64.s:1571: root.etcd                        I0731 14:53:52.396000 etcd.go:277] now serving peer/client/metrics, advertise-client-urls: ["http://localhost:2379/"], initial-advertise-peer-urls: ["http://127.0.0.1:2345/"], listen-client-urls: ["unix:///tmp/metropolis-consensus-test2392152209//ephemeral/consensus/client.sock:0"], listen-metrics-urls: ["http://127.0.0.1:4100/"], listen-peer-urls: ["http://127.0.0.1:2345/"], local-member-id: "9651b3d87a9444db"
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.392005 supervisor_processor.go:83] supervisor processor exiting: context canceled
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.392031 supervisor_processor.go:85] supervisor exited, starting liquidator to clean up remaining runnables...
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.392093 supervisor_processor.go:141] liquidator: root.etcd exited
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.392119 supervisor_processor.go:141] liquidator: root.autopromoter exited
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.392128 supervisor_processor.go:141] liquidator: root.selfupdater exited
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.396822 supervisor_processor.go:141] liquidator: root.etcd.piper exited
    supervisor_testhelpers.go:66: Supervised runnable in harness returned error: bootstrap failed: when waiting for bootstrap etcd: %!w(<nil>)
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.397427 supervisor_processor.go:141] liquidator: root exited
    asm_amd64.s:1571: supervisor                       I0731 14:53:53.397443 supervisor_processor.go:150] liquidator: complete, all runnables dead or done
2023/07/31 14:53:56 supervisor.TestHarness: Waiting for supervisor runnables to die...
2023/07/31 14:53:57 supervisor.TestHarness: All done.
{"level":"warn","ts":"2023-07-31T14:53:59.451Z","logger":"etcd-client","caller":"io_etcd_go_etcd_client_v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0005456c0//tmp/metropolis-consensus-test4121578531//ephemeral/consensus/client.sock:0","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
{"level":"warn","ts":"2023-07-31T14:54:01.164Z","logger":"etcd-client","caller":"io_etcd_go_etcd_client_v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000544700//tmp/metropolis-consensus-test4121578531//ephemeral/consensus/client.sock:0","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
2023/07/31 14:54:01 supervisor.TestHarness: Waiting for supervisor runnables to die...
2023/07/31 14:54:02 supervisor.TestHarness: All done.
2023/07/31 14:54:02 supervisor.TestHarness: Waiting for supervisor runnables to die...
2023/07/31 14:54:02 supervisor.TestHarness: All done.
{"level":"warn","ts":"2023-07-31T14:54:04.087Z","logger":"etcd-client","caller":"io_etcd_go_etcd_client_v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001067340//tmp/metropolis-consensus-test2526546965//ephemeral/consensus/client.sock:0","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
{"level":"warn","ts":"2023-07-31T14:54:04.113Z","logger":"etcd-client","caller":"io_etcd_go_etcd_client_v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001067340//tmp/metropolis-consensus-test2526546965//ephemeral/consensus/client.sock:0","attempt":1,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
............
2023/07/31 14:54:08 supervisor.TestHarness: Waiting for supervisor runnables to die...
2023/07/31 14:54:10 supervisor.TestHarness: All done.
2023/07/31 14:54:10 supervisor.TestHarness: Waiting for supervisor runnables to die...
2023/07/31 14:54:15 supervisor.TestHarness: Still live:
2023/07/31 14:54:15 supervisor.TestHarness: - root
2023/07/31 14:54:16 supervisor.TestHarness: All done.
FAIL
================================================================================
lorenz commented

Probably IO starvation, that etcd should move to tmpfs