Stratio/cassandra-lucene-index

Very slow process of compaction after index setup

karpa13a opened this issue · 5 comments

Good day
C* is 3.11; plugin according version. ubuntu 16.04, java 1.8 latest version
one DC, 3 nodes, keyspace with rf=3
at EC2 with 2 CPU and 4Gb memory each.

cluster works well, data inserted by batches each 15 mins, no problems with compactions and performance, datasize around 15M rows
but im facing with strange behavior after creating lucene index:
ive created index

CREATE CUSTOM INDEX gsm_index ON gsm ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
   'refresh_seconds': '1',
   'schema': '{
      fields: {
         sid: {type: "string"},
         timestamp: {type: "date", pattern: "yyyy/MM/dd"},
         place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}
      }
   }',
   'indexing_threads': '4'
};

index created and works well
on next day i see LA more than 3 (on each node), with queue of 8 compactions.
i was dropped index and all compactions where done in 15 mins.
ive recreated index and got same result on next day.
table simple as follows:

CREATE TABLE gsm (
   sid text,
   timestamp timestamp,
   latitude double,
   longitude double,
   /other columns defenitions/,
   PRIMARY KEY (sid, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)

do i need update EC2 instance with more power? or i hit a bug?

What type of disks are you using? I alleviated similar compaction problems by switching to solid state drives.

@FourSeventy unfortunately but it's not an IO bottleneck(
CPU bound tasks(

unfortunately
updating node from t2.medium(2 cpu) to t2.xlarge(4 cpu) didnt help.
it just eat 350% of CPU.

this makes lucene indexes totally unusable(

may be i can do some kind of debug?

btw it's ok, that MemtableFlushWriter spams log file in around 2 mins? when there is no reads/updates

INFO  [MemtableFlushWriter:372] 2018-05-18 07:24:56,673 Index.scala:127 - Flushing Lucene index  /gsm_index/
INFO  [MemtableFlushWriter:373] 2018-05-18 07:26:00,154 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:374] 2018-05-18 07:27:57,105 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:375] 2018-05-18 07:29:52,975 Index.scala:127 - Flushing Lucene index /gsm_index/

okay
i created index without "place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}" part
and now compactions didnt stuck.

what was wrong with geo_point?
currently index saved once in 3 hours:
INFO [MemtableFlushWriter:508] 2018-05-20 12:00:02,154 Index.scala:127 - Flushing Lucene index ...
INFO [MemtableFlushWriter:515] 2018-05-20 15:00:02,968 Index.scala:127 - Flushing Lucene index ...

So what’s the Cassandra version and what’s the plugin version did we use to avoid compatibility issues? Any suggestions