thelastpickle/cassandra-reaper

Failed creating a merkle tree for [repair #c0c157f0-e14a-11ee-b320-bdc4e5fd08de on reaper_current/running_repairs, [(-8896895687978311387,-8888847627918162438], (-881535601694419919,-867088063190097011], (1356177826732174702,1357186491253880239], (6263469266031338450,6279809997892748801]]], /<IP>:7000

Opened this issue · 3 comments

Project board link

ERROR] [ValidationExecutor:4] 2024-03-13 10:02:47,558 ValidationManager.java:173 - Validation failed.
java.lang.RuntimeException: Parent repair session with id = c0bb8b90-e14a-11ee-b320-bdc4e5fd08de has failed.
at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
at org.apache.cassandra.db.repair.CassandraValidationIterator.(CassandraValidationIterator.java:203)
at org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
at org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
at org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
at org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
at org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)
[ERROR] [ValidationExecutor:4] 2024-03-13 10:02:47,558 CassandraDaemon.java:581 - Exception in thread Thread[ValidationExecutor:4,1,main]
java.lang.RuntimeException: Parent repair session with id = c0bb8b90-e14a-11ee-b320-bdc4e5fd08de has failed.
at org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:690)
at org.apache.cassandra.db.repair.CassandraValidationIterator.getSSTablesToValidate(CassandraValidationIterator.java:116)
at org.apache.cassandra.db.repair.CassandraValidationIterator.(CassandraValidationIterator.java:203)
at org.apache.cassandra.db.repair.CassandraTableRepairManager.getValidationIterator(CassandraTableRepairManager.java:51)
at org.apache.cassandra.repair.ValidationManager.getValidationIterator(ValidationManager.java:89)
at org.apache.cassandra.repair.ValidationManager.doValidation(ValidationManager.java:112)
at org.apache.cassandra.repair.ValidationManager.access$000(ValidationManager.java:41)
at org.apache.cassandra.repair.ValidationManager$1.call(ValidationManager.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:826)

Please note that we have ran the ./nodetool scrub command to check if it resolves the issue, but we get the same erorrs on all 6 cassandra nodes. This issue exists for all the keyspaces/tablenames on each cassandra node.

Cassandra version :- 3.11.6
Reaper version :- 1.1.0

@adejanovski

Please let me know if any other details are required for this issue

@kapilgit123, I sure hope you're not using Reaper 1.1.0 😅

These stack trace aren't giving the reason why validation has failed.
It could be that the segment hit the timeout and you should check in the Reaper logs for how long this segment has been running.
If that's the case, the adaptive nature of the repairs should extend the timeout along the next attempts (assuming you're running a recent version of Reaper).
Otherwise you can change the segment timeout for this repair explicitly (or globally change the default timeout).

That's just an assumption and should be verified by checking the logs more thoroughly in both Reaper and Cassandra.

@adejanovski

I just confirmed the cassandra and reaper versions are as follows.
Cass 4.0.10 and Reaper 3.3.1