diennea/herddb

OutOfMemoryError when executing a large number of INSERT statement files extracted from a dump file

Closed this issue · 8 comments

BUG REPORT

When i followed the instrustion to migrate database from MySQL, some problems occurred.

I did the following process on Windows:
I dumped the database on MySQL and got a 4.3GB .sql file. Then run bin\herddb-cli -f dumpfile.sql --mysql, but got no feedback (expected sth. like Import completed, 1 ops, in 0 minutes).
So i executed the ddl statements first (by adding -fi ddl option) , and everything goes well. When executing INSERT statements, however, same problem occurred (no feedback). To figure out what's going wrong, i splitted the dump file into about 4,000 files, each file contains an INSERT statement (the file size is close to 1MB), then execute the SQLs from file one by one. At the beginning, everything is ok. But it went slower after executing tens of SQLs, and finally got:

Error:java.sql.SQLException: java.util.concurrent.TimeoutException: io-error while waiting for reply from null: java.io.IOException: comunication channel is closed. Cannot wait for pending messages, socket=[id: 0x7aa58328, L:0.0.0.0/0.0.0.0:57863 ! R:localhost/127.0.0.1:7000]

and in server:

NettyChannel{name=unnamed, id=149, socket=[id: 0x41715c90, L:/127.0.0.1:7000 - R:/127.0.0.1:57863] pending 0 msgs}: error java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
        at java.lang.StringBuilder.append(StringBuilder.java:136)
        at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:242)
        at herddb.server.ServerSideConnectionPeer.handleExecuteStatements(ServerSideConnectionPeer.java:628)
        at herddb.server.ServerSideConnectionPeer.requestReceived(ServerSideConnectionPeer.java:173)
        at herddb.network.netty.AbstractChannel.processRequest(AbstractChannel.java:97)
        at herddb.network.netty.AbstractChannel.lambda$handlePduRequest$0(AbstractChannel.java:106)
        at herddb.network.netty.AbstractChannel$$Lambda$190/1369169512.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

Hi @ouovowo
To help us understand the problem, can you share us your JVM configuration (bin/setenv.sh file)?
Note that the exception you sent us indicates that the configured JVM memory is not enough.
One solution would be to try increasing JVM memory by changing the -Xmx and -Xms variables.

In theory while restoring the database the CLI should convert the INSERT statements to statements with placeholders ('?'),
it looks like the planner is parsing huge SQL statements.

You can try to do the dump with the --extended-insert mode in order to have smaller INSERT statements

oops, i didn't specify max memory size, just letting it by default. But it might be 4gb according to my observation.
Thanks for advice. I will have a try with changing the JVM configuration. But --extended-insert mode seems to be enabled by default. it has split the complete INSERT into parts, each part is 1MB.
Btw, should i comment out CREATE DATABASE, SET, USE, LOCK,... statements? It seems that even with --mysql, these statement cannot be dealt properly? To avoid unknown problems, i have commented them out.
Or is there any convenient way to load data from .csv file just like LOAD DATA in mysql.

Are you able to provide a sample of the dump file ?
usually restoring mysql dumps works well out of the box, but it has been passed quite some time that I did it

maybe @aluccaroni can share some suggestions about the command to use to dump the mysql files (I remember we tested importing databases for some widely used applications)

I have re-read the docs for mysqldump
actually you would have to try to disable extended inserts
and this seems possible by using --skip-opt

because otherwise (as you say) it is enabled by default

Cheers! It seems work well now by adding --extended-insert=FALSE, though it's quite slow. Thanks so much.

@eolivelli I don't have the procedure on hand on how we exported and imported into herd the few production mysql database we used to have (in the last few years we mainly created new herd installation without any migration), but using the "--skip-opt" option was pretty much the standard on our mysql backup procedures, so it's very likely that we used that at the time.

It seems that --skip-opt is not the key. Problem occurs only when continuous inserting tuples into large or GIANT table. Due to the limit of my device performance, my only solution is to restart the server every certain times of insertion to avoid catastrophic gc overhead. Troublesome, but simple and effective.