tjake/Solandra

db corruption: AssertionError DecoratedKey != DecoratedKey

codingismy11to7 opened this issue · 4 comments

We're reliably (but not reproducibly) hitting a database corruption issue that we can't recover from, or at least haven't figured out a way to do so. Everything I can find on the internet is talking about a cassandra bug that was fixed in 0.6.1, and was for indexes over 2GB (we're nowhere close to that), so I'm guessing it's caused by Solandra somehow.

Once we get into this state, queries against the selected core never return, and Solandra spits out the same stack trace over and over every few seconds:

java.lang.AssertionError: DecoratedKey(90002160063266891977802944676337065984, 63757272656e74) != DecoratedKey(90002160063266891977802944676337065984, 3930303032313630303633323636383931393737383032393434363736333337303635393834efbfbf736861726473) in /path/to/solandra/data/data/L/SI-g-65-Data.db
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:59)
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1407)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1304)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1261)
        at org.apache.cassandra.db.Table.getRow(Table.java:385)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:61)
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:668)
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1133)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Hopefully this tells you something? We could probably send along affected db files.

what revision of solandra is this? last commit?

I thought we were up to date, but it looks like we last pulled on Sept 29th, so there have been some new commits. Could this possibly be fixed? Is there a way to upgrade solandra without losing the data?

Don't update to the latest since it includes a breaking change.

Did this start happening once you updated?

My guess is it's related to this change 386746a

perhaps you can revert that and see if that helps

The actual fix for this is 7b18f06 but it requires re-indexing. if you want to just fix the problem without updating then i think reverting 386746a will do that