Set CI DB credentials (environment variables):
CI_DB_HOST
CI_DB_USER
CI_DB_PWD
Search for "Redpanda Test Results DB" on wiki
pip3 install sh psycopg2
- install gh
Use gh to set its credentials
Correlate issues with failures. The first run takes ~30 mins because it downloads Redpanda Test Results DB and gh issues locally from scratch. The follow up will fetch only delta.
NO_COLOR=1 python3 index.py
# list failures which doesn't have corresponding issues
python3 view_new.py
Output
f-id build freq total first occ. title
491 #31233 628 1 2023-06-14 <NodeCrash (docker-rp-21,docker-rp-8) docker-rp-21: Redpanda process u
490 #31229 2223 24 2023-06-14 AttributeError("'NoneType' object has no attribute 'account'")
489 #31229 321 1 2023-06-14 TimeoutError()
487 #31184 184 1 2023-06-13 <NodeCrash docker-rp-16: ERROR 2023-06-13 15:22:08,202 [shard 1] asser
484 #31103 185 1 2023-06-12 <NodeCrash docker-rp-13: ERROR 2023-06-12 17:28:55,404 [shard 0] asser
...
202 #23852 326 3 2023-02-23 HTTPError('504 Server Error: Gateway Timeout for url: http://docker-rp
9 #21947 1652 9 2023-01-27 TimeoutError('Redpanda service docker-rp-10 failed to start within 60
40 #20893 183 2 2023-01-10 TimeoutError("Consumer failed to consume up to offsets {TopicPartition
Legend
f-id - id of a failure, check `failures/491.json`
build - buildkite's build, check https://buildkite.com/redpanda/redpanda/builds/31233
freq - frequency of the fails (number of failures per 1MM)
total - total number of fails
first occ. - first occurrence of the failure
title - failure's title (to ignore: python3 view_new.py --notitle)
test - failing test (to include: python3 view_new.py --test)
Sometime a rogue ci build causes a lot of errors which may be ignored (e.g. a responsible PR is identified and rolled back). In this case edit .ciignore.json
to exclude that build, reset last-processed-id
to -1
in data/builds/manifest.json
and restart index.py
.
# list failures which keep happening after an issue is closed
python3 view_issues.py --type reopen
Output
f-id issue freq total last occ. title
311 #10087 1027 10 2023-05-02 CI Failure (connection refused to admin API) in `EndToEndShadowIndexin
240 #7418 2938 32 2023-05-31 CI Failure (partitions_rebalanced times out) in `ScalingUpTest`.`test_
238 #9459 370 2 2023-06-14 CI Failure target_offset <= _insync_offset in `PartitionBalancerTest.t
...
57 #8919 1146 21 2023-06-14 CI Failure (Assertion `_state && !_state->available()' failed) in `Ran
33 #8220 1101 6 2023-05-04 Disk usage ratio check failing in `PartitionBalancerTest`.`test_full_n
1 #8421 1285 7 2023-06-02 CI Failure (heap-use-after-free) in `AvailabilityTests.test_availabili
python3 view_issues.py --type top
Output
f-id issue freq total last occ. title
464 #11306 191588 41 2023-06-14 CI Failure (Consumed from an unexpected) in `SimpleEndToEndTest.test_c
429 #11044 56807 121 2023-06-15 CI Failure (Timeout - Failed to start) in `MultiTopicAutomaticLeadersh
362 #10849 47368 45 2023-05-18 CI Failure (timeout waiting for `rpk cluster` to list 1 topic) in `Sha
448 #11151 22666 17 2023-06-02 CI Failure (TimeoutError) in `CloudStorageChunkReadTest.test_read_when
224 #10873 16984 49 2023-05-21 CI Failure (Reported cloud storage usage did not match the manifest in
...
329 #10368 51 1 2023-04-22 CI Failure (topic does not exist while deleting topic) in `ShadowIndex
412 #10935 45 1 2023-05-19 CI Failure (Internal Server Error) in `PartitionBalancerTest.test_deco
428 #11410 42 1 2023-05-25 CI Failure (consumers haven't finished) in `CompactionE2EIdempotencyTe
482 #11371 9 1 2023-06-12 CI Failure (Redpanda failed to stop in 30 seconds) in `PartitionMoveIn
The command is usefull to chase a PR causing the problem.
python3 view_issues.py --type first
f-id issue freq total first occ. title
486 #11151 1362 1 2023-06-13 CI Failure (TimeoutError) in `CloudStorageChunkReadTest.test_read_when
485 #11365 666 5 2023-06-12 CI Failure (Timeout waiting for partitions to move) in `NodesDecommiss
482 #11371 9 1 2023-06-12 CI Failure (Redpanda failed to stop in 30 seconds) in `PartitionMoveIn
...
60 #10218 739 4 2022-12-21 CI Failure (ignored exceptional future) in `PartitionBalancerTest.test
80 #10024 2205 24 2022-12-14 CI Failure (TimeoutError in wait_for_partitions_rebalanced) in `Scalin
78 #11062 551 6 2022-12-13 CI Failure (startup failure) in `ScalingUpTest.test_adding_multiple_no
python3 view_issues.py --type recent
f-id issue freq total last occ. title
485 #11365 925 7 2023-06-15 CI Failure (Timeout waiting for partitions to move) in `NodesDecommiss
429 #11044 61167 132 2023-06-15 CI Failure (Timeout - Failed to start) in `MultiTopicAutomaticLeadersh
490 #11456 2489 27 2023-06-15 CI Failure (`AttributeError: 'NoneType' object has no attribute 'accou
...
305 #10363 628 3 2023-05-04 CI Failure (assertion error: groups not reported after migration) in `
60 #10218 735 4 2023-05-04 CI Failure (ignored exceptional future) in `PartitionBalancerTest.test
329 #10368 51 1 2023-04-22 CI Failure (topic does not exist while deleting topic) in `ShadowIndex
A failure associated with the issues hasn't failed within two months
python3 view_issues.py --type stale
f-id issue freq total last occ. title
15 #8496 933 4 2023-04-12 CI Failure (_topic_remote_deleted timeout) in `TopicDeleteCloudStorage
223 #9751 204 4 2023-04-05 CI Failure (timeout + UNKNOWN_TOPIC_OR_PARTITION) in `EndToEndTopicRec
4 #8457 206 5 2023-03-07 CI Failure (Timeout on manifest_has_one_segment) in `AdjacentSegmentMe