tjake/Solandra

deleting elements breaks range queries

kRyszard opened this issue · 4 comments

Hi,

we have solandra cfbce36 (tjake's, 23-01-2012)
we fill an index with some data, the schema has a long field
we delete 1000 (out of 10000) elements by id.
we perform some range queries that do not work (worked before deleting) f.e.
q=post_creation_date:[1331436339000 TO 1331868339000] returns 3191 objects
q=post_creation_date:[1331004339000 TO 1331868339000] returns 0 objects
q=post_creation_date:[* TO 1331868339000] returns 0 objects

btw: I am aware that there is another bug in Solandra related to range queries: when solandra returns data ordered by some long field the longs are treated as numbers but when using range quieries they're compared like strings f.e. [10 TO 30] may return 10, 30, 200 (in that order).

i've tested the process (add docs, perform range queries, delete some data, range queries again) against few revisions of solandra and discovered, that there were working revisions. The last working revision was 513eda7 from 9-09-2011 and first non working rev was a32ec23 from 27-09-2011. The difference between them is as big as only one line. I've installed cassandra 1.0.8 + tjake's cfbce36 and fixed the line - I've tested deletes+range queries - now it works ok, but I do not understand why and if it's not gonna break something else :(
Anyone knows what this line means?

That one liner was to avoid pulling too much data at once. Seems like if you delete then perhaps the logic pulls only tombstoned columns and gives up.

  1. Does it mean if I have broken range queries (on a cluster without this fix) i can perform cleanup to remove all tombstones and make range queries working?
  2. Are the old values (4/64) "safe"? I mean is this sufficient size of data to pull to make all queries work?

btw: wow, tjake, you're alive ;P since there's an opportunity to talk to you can you please give us a quick comment on how do you see the future of solandra, I mean are you still working on it, planning a release or sth?

I think 2/3 will work.

I've been M.I.A. due to my time being spent on DataStax Enterprise Search which provides native Solr access to Cassandra column families. Also Cassandra 1.0 broke Solandra's partitioner. 1.1 will fix it so I will upgrade it then.