memgraph/mage

[BUG] Memory limit exceeded when adding distances from `distance_calculator.multiple()` as relationship properties

katarinasupe opened this issue ยท 17 comments

Describe the bug
It is significantly slower when you run distance_calculator.single() instead of distance_calculator.multiple(). Why is that, and what is the best approach in such a situation? Also, if I want to use those distances and set them as properties of an edge, I run into

Query failed: Memory limit exceeded! Atempting to allocate a chunk of 3.00GiB which would put the current use to 12.35GiB, while the maximum allowed size for allocation is set to 9.72GiB.

The query is not working even with a 32GB instance.

To Reproduce
Steps to reproduce the behavior:

  1. Import the data (save the below file as json, copy it to Memgraph and run
    CALL import_util.json("/usr/lib/memgraph/query_modules/export.json");.
    export.txt

  2. Run

MATCH (o) - [r:SHIPS_TO] -> (d)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, 'km') 
YIELD distances
RETURN distances;
  1. Run
MATCH (o) - [r:SHIPS_TO] -> (d)
CALL distance_calculator.single(o, d, 'km') 
YIELD distance
RETURN distance;
**Expected behavior**
A clear and concise description of what you expected to happen.

Compare the execution time of the two above queries. Why is the distance_calculator.single() significantly slower?
Screenshot 2022-12-22 at 16 25 46

Another issue appears when I run distance_calculator.multiple() and I want to add the distance to the relationship between the nodes:

MATCH (o:Origin) --> (d:Destination)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, 'km')
YIELD distances
WITH os, ds, distances, range(0, size(distances) - 1, 1) AS iterator
UNWIND iterator AS idx 
WITH os[idx] as o, ds[idx] as d, distances[idx] as distance
MERGE (o)-[:SHIPS_TO {distance: distance}]->(d)
RETURN o, d, distance;

I get:
Query failed: Memory limit exceeded! Atempting to allocate a chunk of 3.00GiB which would put the current use to 12.35GiB, while the maximum allowed size for allocation is set to 9.72GiB.

Is there any better way to write such a Cypher query?

Additional context
This question was raised on Memgraph's Discord server.

Q: It is significantly slower when you run distance_calculator.single() instead of distance_calculator.multiple(). Why is that, and what is the best approach in such a situation?

A: The issue lies in the number of executions the Python script is called for calculating distances. the distance_calculator.single() takes into account only the distance between 2 points, and therefore it is very time consuming since for every row of the MATCH (o) - [r:SHIPS_TO] -> (d) the query module is called. It might be more readable, however it shows on the performance. On the other hand, when we collect inputs and outputs into 2 lists of the same size, the distance_calculator.multiple() is only called once and everything is executed at one go.

As for the memory limit, I'll need to see further on what's happening.

Added task to backlog.

@Josipmrden thanks! makes sense that calling the query module many times hurts performance.

RE: the memory limit, I noticed that just calling:

MATCH (o:Origin) --> (d:Destination)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, 'km')
YIELD distances
WITH os, ds, distances, range(0, size(distances) - 1, 1) AS iterator
RETURN os, ds, distances, iterator;

works fine, without any memory issues (and is quite fast). However, adding the UNWIND calls with:

MATCH (o:Origin) --> (d:Destination)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, 'km')
YIELD distances
WITH os, ds, distances, range(0, size(distances) - 1, 1) AS iterator
UNWIND iterator AS idx 
WITH os[idx] as o, ds[idx] as d, distances[idx] as distance
RETURN o, d, distance;

runs into the memory limit issue - which seems strange, given that the data are computed without issues, just being reshaped.

Even more, unwinding just the iterator variable works without issues:

MATCH (o:Origin) --> (d:Destination)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, 'km')
YIELD distances
WITH os, ds, distances, range(0, size(distances) - 1, 1) AS iterator
UNWIND iterator AS idx 
RETURN idx;

whereas adding one of the other variables in the RETURN runs into the memory issue:

MATCH (o:Origin) --> (d:Destination)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, 'km')
YIELD distances
WITH os, ds, distances, range(0, size(distances) - 1, 1) AS iterator
UNWIND iterator AS idx 
RETURN distances[idx], idx;

Seems like the distances record gets copied for each value of idx. For context, running this on a 32GB instance.

Hi @hbot-ll, I will try to check what is happening here. Thanks for the detailed view of the bugs. Some might be MAGE issues, other bugs might be Memgraph issues, and there are a few questions that will help me uncover what is happening.

Which version of Memgraph are you using, on which system are you using the Docker version or Linux native instance, and are you maybe using Memgraph through the Memgraph platform? Thanks for answering! It will help in debugging.

Hi @antoniofilipovic, I'm running Memgraph through Docker using:

docker run -it -p 7687:7687 -p 7444:7444 \
 -v mg_lib:/var/lib/memgraph \
 -v mg_log:/var/log/memgraph \
 -v mg_etc:/etc/memgraph \
memgraph/memgraph-platform

so I assume it's the latest version, using the Memgraph platform.

Hi @horatiubota,
you can try running following query when you start memgraph platfrom:

SHOW VERSION;

In my terminal it looks as follows when I start memgraph-platform:

mgconsole 1.3
Connected to 'memgraph://127.0.0.1:7687'
Type :help for shell usage
Quit the shell by typing Ctrl-D(eof) or :quit
memgraph> show version;
+---------+
| version |
+---------+
| "2.5.0" |
+---------+
1 row in set (round trip in 0.000 sec)
memgraph> :quit
Bye

It should return latest version 2.5.0 if everything is okay. This was added in Memgraph 2.5.0, so if you are not running it, we can know for sure. I am asking you this, because we resolved some bug fixs with latest Memgraph 2.5.0 concerning memory issues we had before.

Considering other more important stuff, I checked distance_calculator and there is Python API related issue considering distance_calculator.single method and why is it so slow. I am inspecting that, but what I can say to you now is that I tried rewriting distance_calculator.single in C++ and it is faster than Python version of distance_calculator.multiple. This bugfix should be soon on MAGE. I am not sure how fast, because during rewriting code in C++, we discovered one bug on C++ API side, and we are resolving it now.

If this part is urgent for us to fix to you, let me know and I will push it on our side to be done as fast as possible.

Considering other bugs, we will investigate what happens with memory if you are running Memgraph 2.5.0. If you are not running it, please try with Memgraph 2.5.0.

Hi @antoniofilipovic,

Thank you for the update! I am indeed running 2.5.0.
Is the C++ version of distance_calculator.single anywhere in the repo?

Thank you!

Hi, it is not still in repo, I will notify you when we merge it inside. Is this really urgent, or you can currently work with distance multiple? We are working on it, just depending on urgency I can push more for code reviews to happen on our side.

@antoniofilipovic thank you again! Not urgent, but maybe you could share it in a gist? I know it's a very simple function, but maybe I can test it out on my setup and see how to use the C++ API.

Actually, we have a pull request on MAGE as a work in progress so feel free to check it out. But it probably won't work as there is a bug when getting Properties from Node we are fixing on the Memgraph side.

Hi @horatiubota,

I checked the bug you reported with the memory limit exceeded. I was wrong when I said it was fixed in 2.5.0. It is fixed, but it will be released in Memgraph 2.5.1.

I tested on the newest commit on the Memgraph database, and there are no more memory limits exceeded. Actually memory limit was never exceeded as the dataset sample @katarinasupe sent is small, but there were no considerable jumps in memory (to a few GBs as is the case in Memgraph 2.5.0). Memory consumption was on around 100MB more than at rest.

Considering what you said here ๐Ÿ‘‡ , you were right, copying happens behind the scenes, but handling memory was an issue and it should be fixed now.

whereas adding one of the other variables in the RETURN runs into the memory issue:

MATCH (o:Origin) --> (d:Destination)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, 'km')
YIELD distances
WITH os, ds, distances, range(0, size(distances) - 1, 1) AS iterator
UNWIND iterator AS idx
RETURN distances[idx], idx;
Seems like the distances record gets copied for each value of idx

I'm not sure when 2.5.1 will be released, but if this is urgent we can make something work earlier for you. Contact me please on Discord, and we can chat further.

I am afilipovic#8994 on Discord.

And considering the problem with Python, as @Josipmrden said this is due to the nature of Python calls and it seems it has a huge impact on performance. On the other hand, such performance hits did not happen on the C++ side, so that is great.

Hi @horatiubota, after researching why is Python execution row by row slower, we discovered that garbage collector is called after each Python execution which slows collect execution down significantly. Since this is currently by design this way to reduce memory footprint, we suggest to keep using C++ for performance, and if there will be additional problems, we can see what we can do as a tradeoff between memory footprint and performance.

Since this issue is fixed, I will close it for now. If you have any additional questions feel free to ask.

@antoniofilipovic tried the C++ implementation of distance_calculator on a larger dataset and am now getting:

Query failed: distance_calculator.multiple: Could not allocate memory!

The dataset is larger, but not that large: I'm trying to compute the distance between 200,000 pairs of location nodes. The query I'm running is:

MATCH path = (o:Origin) - [:SHIPS_TO] -> (d:Destination)
WITH collect(o) as os, collect(d) as ds
CALL distance_calculator.multiple(os, ds, "km") YIELD distances
RETURN distances;

distance_calculator.single works and is fast enough for my use-case, so just flagging this in case it comes up elsewhere.

This https://memgraph.com/docs/memgraph/reference-guide/memory-control#controlling-the-memory-usage-of-a-procedure (procedure memory limit) might help. In general, Memgraph tries to limit memory on multiple layers:

  1. globally, as a runtime settings
  2. per query
  3. per procedure call

Please check that first ๐Ÿ˜„

I see, thanks @gitbuda! Maybe you could add some info about controlling memory usage to the Query failed: Could not allocate memory! error message.

100%, there are actually more messages we have to improved, just created a new PR -> memgraph/memgraph#771, will try to get it done over the following days ๐Ÿ˜„