apache/drill

the mergejoin memery leak

weijunlu opened this issue · 5 comments

Drill version:
1.22.0-SNAPSHOT

Describe the bug
the mergejoin memery leak.

To Reproduce

  1. Configure direct memory, In the drill-env.sh file in the conf directory add:
    export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"1G"}
  2. drill hive storage plugin configuration:
    {
    "type": "hive",
    "configProps": {
    "hive.metastore.sasl.enabled": "false",
    "fs.default.name": "hdfs://xxx:9000",
    "datanucleus.schema.autoCreateAll": "true",
    "javax.jdo.option.ConnectionURL": "jdbc:mysql://xxx:3306/tpch?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8&useSSL=false",
    "javax.jdo.option.ConnectionUserName": "root",
    "javax.jdo.option.ConnectionPassword": "",
    "hive.metastore.uris": "thrift://xxx:9083",
    "hive.metastore.schema.verification": "false",
    "hive.metastore.warehouse.dir": "/user/hive/warehouse",
    "hive.jdbc.driver": "org.apache.hive.jdbc.HiveDriver",
    "hive.jdbc.url": "jdbc:hive2://xxx:10000",
    "hive.jdbc.user": "root",
    "hive.jdbc.password": "
    ",
    "hive.dml.mode": "jdbc",
    "writable": "true"
    },
    "enabled": true,
    "authMode": "SHARED_USER"
    }
  3. Create eight tpch tables, use tpch tool to generate 1s data, and import data into hive. See the attachment for details.
  4. Write the execution script, run the tpche sql statements in 5 concurrency:
    shell script:
    [root@jupiterdev memeryleak]# cat para_power_test.sh
    #!/bin/bash

drill_home=/opt/apache-drill-1.22.0-SNAPSHOT/bin

random_sql(){
while true
do
$drill_home/sqlline -u "jdbc:drill:zk=localhost:2181" -f tpch_sql8.sql >> log/tpch1s_sql8.log 2>&1
done
}

main(){
unset HADOOP_CLASSPATH
#sleep 2h
#TPCH power test
for i in seq 1 5
do
random_sql &
done
}
main $@
the sql as follows:
select
o_year,
sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share
from (
select
extract(year from o_orderdate) as o_year,
l_extendedprice * 1.0 as volume,
n2.n_name as nation
from hive.tpch.part, hive.tpch.supplier, hive.tpch.lineitem, hive.tpch.orders, hive.tpch.customer, hive.tpch.nation n1, hive.tpch.nation n2, hive.tpch.region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01'
and date '1996-12-31'
and p_type = 'LARGE BRUSHED BRASS') as all_nations
group by o_year
order by o_year;
5. Execute the script and view the sqlline running status:
[root@jupiterdev memeryleak]# ./para_power_test.sh
[root@jupiterdev memeryleak]# jps | grep SqlLine
5638 SqlLine
5799 SqlLine
5705 SqlLine
5709 SqlLine
5584 SqlLine

  1. View drill's direct memory through the 8047 page:
1 2
  1. After running for a while, an sql statement error is detected, and then pkill -f para_power_test.sh
  2. Check again that there is no sqlline execution, bug the direct memery is not fully release.
4

Expected behavior
Expected that the direct memery can fully release.

Additional context
update the planner.enable_hashjoin option to false through the 8047 page
1705471433736
hive.txt

add the log and the sql execute plan:
`Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
at org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
at org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:246)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:100)
at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:160)
at org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
at org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:300)
at org.apache.drill.common.DeferredException.suppressingClose(DeferredException.java:159)
at org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:169)
at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:407)
at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:239)
... 5 common frames omitted
Suppressed: org.apache.drill.exec.ops.QueryCancelledException: null
... 17 common frames omitted
Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (73728)
Allocator(op:2:0:11:MergeJoinPOP) 1000000/73728/4874240/10000000000 (res/actual/peak/limit)

            at org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
            at org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159)
            at org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77)
            at org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:595)
            at org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:585)
            at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:414)
            ... 6 common frames omitted
    Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (249856)

Allocator(op:2:0:5:MergeJoinPOP) 1000000/249856/9121792/10000000000 (res/actual/peak/limit)

            at org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
            at org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159)
            at org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77)
            at org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:595)
            at org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:585)
            at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:414)
            ... 6 common frames omitted
    Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (2000000)

Allocator(frag:2:0) 138000000/2000000/165842048/240000000000 (res/actual/peak/limit)

            at org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
            at org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:595)
            at org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:588)
            at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:414)
            ... 6 common frames omitted

`
1705472684713

@weijunlu, thanks for this report. You've encountered one of the two memory-related issues that confuse many new users of Drill (or Java).

Your report appears to say that you a) configure direct memory, b) run a query that uses quite a bit of memory, and c) find that Drill has not returned the memory to the OS after the query completes. As it turns out, this is by design.

Drill uses the Netty memory manager. Nettywill request memory from the OS as needed, but will not release it back. Once the memory is allocated, Netty adds that memory block to a free list inside Drill itself where Drill can reuse it for a later query.

You can see this. After you run your first batch of queries, check the direct memory level. Then, without restarting Drill, run the same batch of queries a second time. You should see that the amount of memory allocated to Drill stays about the same: the direct memory was used, released, and then reused -- all within Drill. Memory might grow a bit as explained below.

Now, it might be possible to release memory back to the OS, but only in a very lightly used system in which there are times when no queries are active. Why? Drill uses a "binary buddy" memory allocation starting with blocks of 16MB in size, then slicing memory up into smaller pieces as needed. If any portion of that 16MB is allocated, the 16MB block cannot be released back to the OS. Drill was designed for an environment with heavy usage, in which case every 16MB block will have at least some of its memory in use at all times. Given that environment, it did not make sense to try to free 16MB blocks only to immediately request them again: Drill would just thrash the memory subsystem.

Drill (actually Netty) will request more memory from the OS once the current set of 16MB blocks become allocated. So, in a busy system, memory will continue to grow. Many people see this and say, "memory leak!" But, it is not a leak, it is by design. Drill will continue to allocate memory until it reaches the limit you set in the configuration. If Drill still needs more memory after that, queries will fail with an out-of-memory (OOM) error.

Given that Drill is now most often used for smaller use cases, some very clever person might be able to find a way to ask the Netty memory allocator to return to the OS any 16MB block which is entirely free. That's a good enhancement project.

I mentioned that this is one of two issues that confuse people. The other is the normal Java heap. Java itself will continue to allocate more memory for the heap as Drill runs, up to the configured limit. Java never releases its memory either: the unused memory is simply available on the heap for later reuse within Java.

The summary is that Drill will eventually use all the heap and direct memory you allocate to it. Once allocated, the memory will not go back to the OS, even if Drill is idle. This is why it is important to configure the memory level to suite your use case: Drill won't grab and free OS memory based on load.

Finally, of course, Drill will release all its memory back to the OS when Drill exits. So, if you run a query only every once in a while, simply use a script to bounce the Drill server after it has been idle for a while.

Does this make sense?

I run the whole test process again, found that the heap memory reached 94%, no longer release, and then executed the above sql statement, immediately reported an error:
image
image

@weijunlu, you may have found a bug. The behavior you described is not expected.

Just to verify I understand: you ran your query once. Did the query run successfully the first time? Or, did your first query fail? If the query worked the first time, we have one situation. If the query failed the first time, we have another situation.

Then, you ran the same query a second time? This time you got an OOM error?

I think we may have up to three distinct issues.

First, I missed one important piece of information when I first read this report. You reported an exception: Memory was leaked by query. Memory leaked: (249856). This messages indicates an actual bug. As I explained, Drill won't release the memory back to the OS. But, the query should have released the memory back to Netty. The error message says that it did not. This would be a bug. Even if the query fails, Drill is required to return memory to Netty. Such errors are, unfortunately, hard to track down. Our unit tests have many such checks, but your query appears to have uncovered a case that the unit tests do not check.

The stack trace suggests that the memory leak was detected as Drill was trying to shut down a failed query. So, I suspect your query run may not have actually worked. If it did work, you should see in the UI a very detailed report on all the operators in the query. Did you get that detailed report?

Second, I wonder if Drill has sufficient memory for the query you are trying to run. The TPCH queries tend to be memory-hungry. This query has six concurrent sorts. Then, it has five merge joins, plus more sorts. While the query runs, all the needed data will be in memory. (It is in memory because Drill uses memory to buffer data to keep things fast.)

I do not recall the size of a SF1 TPCH data set. What is the total size of the files in your data set?

Back when we used to run TPCH tests regularly, we would use a cluster of 5 or 10 Drillbits, each with a generous amount of memory, so that Drill can hold the data in memory. I suspect that 2 GB of direct memory, on a single Drillbit, is probably not enough for this kind of query.

So, I suspect you need more direct memory. The UI says that you are using only 34% of a 4GB heap, but ~100% of 2GB direct memory. Try changing this allocation. Try allocating 3GB to heap. This will raise heap usage to 45%. Then, give as much memory as your machine has to direct memory. If you are running SqlLine on the same machine, please run it on another machine instead. If you have other services on your machine (a database, an IDE, etc.), I would recommend moving Drill to a dedicated machine. You may even consider using a cluster of machines to provide sufficient resources.

Once you get a query that succeeds, you can look at the detailed query profile to find out how much memory it required. Then, you can reduce the direct memory given to Drill to a lower amount, if the query allows.

Once you find a memory level that allows the query to run, you will probably avoid the memory leak error. The query will succeed and memory should be freed back to Netty as normal. TPCH SF1 queries used to pass. They should still work, unless something changed in the last couple of years in one of the operators.

Just to be clear, Drill works well on a laptop with the standard amount of memory if you run simple queries on small datasets. But, if you do classic TPCH "big data" queries, you need a cluster with enough capacity — that's why we use Hadoop.

The third issue is a minor point: the UI screenshot is a bit odd: it says you are at 94% of direct memory. I suspect you are actually at 100%, and the UI is dividing by the wrong number. (1000 vs. 1024 or some such).

@weijunlu, you may have found a bug. The behavior you described is not expected.

Just to verify I understand: you ran your query once. Did the query run successfully the first time? Or, did your first query fail? If the query worked the first time, we have one situation. If the query failed the first time, we have another situation.

Then, you ran the same query a second time? This time you got an OOM error?

I think we may have up to three distinct issues.

First, I missed one important piece of information when I first read this report. You reported an exception: Memory was leaked by query. Memory leaked: (249856). This messages indicates an actual bug. As I explained, Drill won't release the memory back to the OS. But, the query should have released the memory back to Netty. The error message says that it did not. This would be a bug. Even if the query fails, Drill is required to return memory to Netty. Such errors are, unfortunately, hard to track down. Our unit tests have many such checks, but your query appears to have uncovered a case that the unit tests do not check.

The stack trace suggests that the memory leak was detected as Drill was trying to shut down a failed query. So, I suspect your query run may not have actually worked. If it did work, you should see in the UI a very detailed report on all the operators in the query. Did you get that detailed report?

Second, I wonder if Drill has sufficient memory for the query you are trying to run. The TPCH queries tend to be memory-hungry. This query has six concurrent sorts. Then, it has five merge joins, plus more sorts. While the query runs, all the needed data will be in memory. (It is in memory because Drill uses memory to buffer data to keep things fast.)

I do not recall the size of a SF1 TPCH data set. What is the total size of the files in your data set?

Back when we used to run TPCH tests regularly, we would use a cluster of 5 or 10 Drillbits, each with a generous amount of memory, so that Drill can hold the data in memory. I suspect that 2 GB of direct memory, on a single Drillbit, is probably not enough for this kind of query.

So, I suspect you need more direct memory. The UI says that you are using only 34% of a 4GB heap, but ~100% of 2GB direct memory. Try changing this allocation. Try allocating 3GB to heap. This will raise heap usage to 45%. Then, give as much memory as your machine has to direct memory. If you are running SqlLine on the same machine, please run it on another machine instead. If you have other services on your machine (a database, an IDE, etc.), I would recommend moving Drill to a dedicated machine. You may even consider using a cluster of machines to provide sufficient resources.

Once you get a query that succeeds, you can look at the detailed query profile to find out how much memory it required. Then, you can reduce the direct memory given to Drill to a lower amount, if the query allows.

Once you find a memory level that allows the query to run, you will probably avoid the memory leak error. The query will succeed and memory should be freed back to Netty as normal. TPCH SF1 queries used to pass. They should still work, unless something changed in the last couple of years in one of the operators.

Just to be clear, Drill works well on a laptop with the standard amount of memory if you run simple queries on small datasets. But, if you do classic TPCH "big data" queries, you need a cluster with enough capacity — that's why we use Hadoop.

The third issue is a minor point: the UI screenshot is a bit odd: it says you are at 94% of direct memory. I suspect you are actually at 100%, and the UI is dividing by the wrong number. (1000 vs. 1024 or some such).

i think it leak ponit , when sql stop ,tmp memory should be clearup .