Memory leak

Question

Memory leak

hvm2hvm opened this issue 13 years ago · 9 comments

Hello,

Me and my colleagues use the plugin you created to do some analysis on the data we have in our DB. As the DB got larger it started to crash and we found out that it's because of memory swapping.

We think there is a memory leak in the system - the GC keeps running but doesn't restore any memory. We checked and double checked our code and the only other part that could cause this would be your plugin (or elasticsearch itself which is less likely).

Is it possible a memory leak occurs in the facet-script?

I'll get back with more information if you need it.

Thanks,

Voicu Hodrea

Answer 1 · 2013-02-07T23:09:52.000Z

The most likely reason for this error is simply running out of memory due to field cache growth. Were you monitoring field cache size while your analysis requests were running by any chance? Did field cache grow? If not, could you restart the cluster, rerun the test and check the size of field caches. You can do it by running the following command:

curl -XGET 'http://localhost:9200/_cluster/nodes/stats?pretty=true'

By the way, which version of elasticsearch and scripting language are you using? Do you generate scripts on the fly or use the same script over and over again?

Answer 2 · 2013-02-07T23:17:31.000Z

Hi Igor,

I'm one of hvm2hvm's colleagues. Here's what the cache property looks like after running our scripts a bunch of times:

"cache" : {
  "field_evictions" : 0,
  "field_size" : "14.6gb",
  "field_size_in_bytes" : 15756926302,
  "filter_count" : 20,
  "filter_evictions" : 0,
  "filter_size" : "3.4mb",
  "filter_size_in_bytes" : 3616096
}

We're currently running a 3-node cluster on Elasticsearch 19.10 and the scripts are native Java scripts that are being
run over and over again.

Answer 3 · 2013-02-07T23:22:56.000Z

What's the elasticsearch heap size?

Answer 4 · 2013-02-07T23:29:21.000Z

40G on each node, each server has 48G of RAM and ES is the only thing running on them. We had also tried allocating only 20G of RAM to ES and leaving the rest to the OS cache but all that lead to was ES breaking as soon as we ran one of our scripts with OutOfMemoryError.

I've pasted the /stats response to http://pastebin.com/raw.php?i=ZVpZFtX9 if you would like to see it.

On Friday, February 8, 2013 at 1:22 AM, Igor Motov wrote:

What's the elasticsearch heap size?

—
Reply to this email directly or view it on GitHub (#5 (comment)).

Answer 5 · 2013-02-07T23:41:48.000Z

So, this is the number that I was curious about:

"field_size" : "14.6gb",

On 40G node it shouldn't be a problem, but field cache of this size can easily kill 20G node. This cache is loaded for each field that you sort, run facet on or use in DocLookup in your script. Do you now how big the field cache was on the node that ran out of memory. Also, if you have OutOfMemoryError stack trace, it might tell us about the operation that caused the system to run out of memory. It's not always the culprit, but frequently it is.

Answer 6 · 2013-02-07T23:57:38.000Z

Seems like it crashes when it tries to load the "engagement" field (which is mapped as a not_analyzed string but is generally an array of strings, if that matters at all).

[2013-02-07 19:17:36,551][WARN ][index.cache.field.data.resident] [Zuras] [users] loading field [engagement] caused out of memory failure
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.index.field.data.support.FieldDataLoader.load(FieldDataLoader.java:68)
at org.elasticsearch.index.field.data.strings.StringFieldData.load(StringFieldData.java:90)
at org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:56)
at org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:34)
at org.elasticsearch.index.field.data.FieldData.load(FieldData.java:111)
at org.elasticsearch.index.cache.field.data.support.AbstractConcurrentMapFieldDataCache.cache(AbstractConcurrentMapFieldDataCache.java:130)
at org.elasticsearch.search.lookup.DocLookup.get(DocLookup.java:119)
at io.eclipse.elastic.MapScript.getAs(MapScript.scala:149)
at io.eclipse.elastic.MapScript.run(MapScript.scala:54)
at org.elasticsearch.search.facet.script.ScriptFacetCollector.doCollect(ScriptFacetCollector.java:68)
at org.elasticsearch.search.facet.AbstractFacetCollector.collect(AbstractFacetCollector.java:89)
at org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:59)
at org.apache.lucene.search.FilteredQuery$2.score(FilteredQuery.java:167)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:195)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:426)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:342)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:330)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:178)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:234)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:140)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

On Friday, February 8, 2013 at 1:41 AM, Igor Motov wrote:

So, this is the number that I was curious about:
"field_size" : "14.6gb",
On 40G node it shouldn't be a problem, but field cache of this size can easily kill 20G node. This cache is loaded for each field that you sort, run facet on or use in DocLookup in your script. Do you now how big the field cache was on the node that ran out of memory. Also, if you have OutOfMemoryError stack trace, it might tell us about the operation that caused the system to run out of memory. It's not always the culprit, but frequently it is.

—
Reply to this email directly or view it on GitHub (#5 (comment)).

Answer 7 · 2013-02-08T00:09:23.000Z

If this field contains several values per document it might be the culprit. The current implementation of field cache for fields that contain variable number of values per document is not optimized. A better solution is coming in 0.21. Meanwhile, such fields can consume a lot of memory. If this field is stored, you can try switching to FieldsLookup. It will be slower but it's not going to use as much memory.

Answer 8 · 2013-02-08T00:13:21.000Z

Alright, I'll try switching to FieldLookups for multi-value fields and see if that helps. Thanks a lot for your help!

On Friday, February 8, 2013 at 2:09 AM, Igor Motov wrote:

If this field contains several values per document it might be the culprit. The current implementation of field cache for fields that contain variable number of values per document is not optimized. A better solution is coming in 0.21. Meanwhile, such fields can consume a lot of memory. If this field is stored, you can try switching to FieldsLookup. It will be slower but it's not going to use as much memory.

—
Reply to this email directly or view it on GitHub (#5 (comment)).

Answer 9 · 2013-02-08T00:18:48.000Z

You are welcome! I am going to close this issue for now since it doesn't look like it's a facet-script specific issue.