tjake/Solandra

Dropped reads on bigger data

wwwill opened this issue · 2 comments

With Solandra 7b80ce2, certain queries cause lots of dropped reads on a large batch of data (25GB per node - see dropped read messages below). The queries are fairly simple. For example, show all docs between 2 dates, 2 days apart (range query on field type Date). The queries take 5+ minutes to "forever". The same queries work fine in a different cluster on a tiny set of data. Some info:

  • cluster size = 2
  • using all Solandra defaults except:
    • read/write level=ONE
    • RF=2
  • solandra.shards.at.once is at the default 4
  • memory: node1 max=4GB & node2 max=4GB

Any ideas on what would cause this? I read in "dropped messages" that load shedding is normal, but this seems abnormal, so does it mean that the 2 nodes are overloaded? If so, how to not overload them? Add more nodes?

From node2:
INFO 15:42:08,119 1155 READ messages dropped in last 5000ms
INFO 15:42:08,170 Pool Name Active Pending Blocked
INFO 15:42:08,178 ReadStage 32 5556 0
INFO 15:42:08,178 RequestResponseStage 0 0 0
INFO 15:42:08,179 ReadRepairStage 0 0 0
INFO 15:42:08,179 MutationStage 0 0 0
INFO 15:42:08,179 ReplicateOnWriteStage 0 0 0
INFO 15:42:08,180 GossipStage 0 0 0
INFO 15:42:08,184 AntiEntropyStage 0 0 0
INFO 15:42:08,185 MigrationStage 0 0 0
INFO 15:42:08,185 StreamStage 0 0 0
INFO 15:42:08,185 MemtablePostFlusher 0 0 0
INFO 15:42:08,190 FlushWriter 0 0 0
INFO 15:42:08,191 MiscStage 0 0 0
INFO 15:42:08,191 InternalResponseStage 0 0 0
INFO 15:42:08,192 HintedHandoff 0 0 0
INFO 15:42:08,453 CompactionManager 0 0
INFO 15:42:08,454 MessagingService n/a 0,0
INFO 15:42:08,454 Cache Type Size Capacity KeysToSave Provider
INFO 15:42:08,455 KeyCache 22216 43690 all
INFO 15:42:08,455 RowCache 0 0 all org.apache.cassandra.cache.SerializingCacheProvider
INFO 15:42:08,455 ColumnFamily Memtable ops,data
INFO 15:42:08,456 system.NodeIdInfo 0,0
INFO 15:42:08,456 system.IndexInfo 0,0
INFO 15:42:08,456 system.LocationInfo 0,0
INFO 15:42:08,456 system.Versions 3,111
INFO 15:42:08,456 system.schema_keyspaces 0,0
INFO 15:42:08,457 system.Migrations 0,0
INFO 15:42:08,457 system.schema_columnfamilies 0,0
INFO 15:42:08,457 system.schema_columns 0,0
INFO 15:42:08,457 system.HintsColumnFamily 0,0
INFO 15:42:08,458 system.Schema 0,0
INFO 15:42:08,458 L.FC 0,0
INFO 15:42:08,460 L.TI 0,0
INFO 15:42:08,461 L.TL 0,0
INFO 15:42:08,461 L.SI 0,0
INFO 15:42:08,628 L.Docs 0,0
INFO 15:42:13,646 1779 READ messages dropped in last 5000ms
INFO 15:42:13,647 Pool Name Active Pending Blocked
INFO 15:42:13,705 ReadStage 32 5690 0
INFO 15:42:13,707 RequestResponseStage 0 0 0
INFO 15:42:13,708 ReadRepairStage 0 0 0
INFO 15:42:13,708 MutationStage 0 0 0
INFO 15:42:13,709 ReplicateOnWriteStage 0 0 0
INFO 15:42:13,723 GossipStage 0 0 0
INFO 15:42:13,724 AntiEntropyStage 0 0 0
INFO 15:42:13,725 MigrationStage 0 0 0
INFO 15:42:13,725 StreamStage 0 0 0
INFO 15:42:13,725 MemtablePostFlusher 0 0 0
INFO 15:42:13,726 FlushWriter 0 0 0
INFO 15:42:13,726 MiscStage 0 0 0
INFO 15:42:13,726 InternalResponseStage 0 0 0
INFO 15:42:13,727 HintedHandoff 0 0 0
INFO 15:42:13,727 CompactionManager 0 0
INFO 15:42:13,727 MessagingService n/a 0,1
INFO 15:42:13,728 Cache Type Size Capacity KeysToSave Provider
INFO 15:42:13,866 KeyCache 22862 43690 all
INFO 15:42:13,900 RowCache 0 0 all org.apache.cassandra.cache.SerializingCacheProvider
INFO 15:42:13,900 ColumnFamily Memtable ops,data
INFO 15:42:13,901 system.NodeIdInfo 0,0
INFO 15:42:13,901 system.IndexInfo 0,0
INFO 15:42:13,901 system.LocationInfo 0,0
INFO 15:42:13,901 system.Versions 3,111
INFO 15:42:13,901 system.schema_keyspaces 0,0
INFO 15:42:13,902 system.Migrations 0,0
INFO 15:42:13,902 system.schema_columnfamilies 0,0
INFO 15:42:13,902 system.schema_columns 0,0
INFO 15:42:13,902 system.HintsColumnFamily 0,0
INFO 15:42:13,902 system.Schema 0,0
INFO 15:42:13,903 L.FC 0,0
INFO 15:42:13,903 L.TI 0,0
INFO 15:42:13,903 L.TL 0,0
INFO 15:42:13,903 L.SI 0,0
INFO 15:42:13,903 L.Docs 0,0
INFO 15:42:19,900 GC for ParNew: 1873 ms for 4 collections, 375724096 used; max is 6433734656
INFO 15:42:19,943 GC for ConcurrentMarkSweep: 3505 ms for 1 collections, 421802184 used; max is 6433734656
INFO 15:42:19,943 Pool Name Active Pending Blocked
INFO 15:42:19,944 ReadStage 32 5614 0
INFO 15:42:19,945 RequestResponseStage 0 0 0
INFO 15:42:19,945 ReadRepairStage 0 0 0
INFO 15:42:19,945 MutationStage 0 0 0
INFO 15:42:19,945 ReplicateOnWriteStage 0 0 0
INFO 15:42:19,946 GossipStage 1 6 0
...

Your thrift size may be set too low, so Cassandra is dropping reads. I think default is 15 megabytes.

Thanks, Oleg. I was able to stop the dropped reads and get reasonable response times by running queries after each large commit (so-called warming). I will try your suggestion if I see the issue again.