In-memory analytics benchmark run question
minkyuSnow opened this issue · 8 comments
Hello
I am running in-memory analytics application on Arm cpu and memory 4GB
Running the benchmark with the benchmark data set set to 144MB produces the results "Movies Recommended" and "Benchmark Execution Time".
However, when I look at the message in the process of calculating the result, there is something that looks like an error message, so I wonder if this benchmark ran normally.
Here is the command I used to run the benchmark.
I'm trying to do it on one node. Node spec = Arm CPU + Memory 4GB
$ docker create --name movielens-data cloudsuite/movielens-dataset
$ docker run -dP --net host --name spark-master cloudsuite/spark:3.3.2 master
$ docker run -dP --net host --volumes-from movielens-data --name spark-worker-01 cloudsuite/spark:3.3.2 worker
spark://NODE_IP:7077
$ docker run --rm --net host --volumes-from movielens-data cloudsuite/in-memory-analytics /data/ml-latest
/data/myratings.csv --driver-memory 2g --executor-memory 2g --master spark://NODE_IP:7077
Hello,
After looking at the first error log, I believe the reason is running out of memory (Java.OutOfMemoryError
). If possible, you can give more memory. Or you may consider redistribute the memory allocation for driver and executor, e.g., 1GB memory for driver and 3GB for executor.
Please let me know if that helps!
Hello,
After looking at the first error log, I believe the reason is running out of memory (
Java.OutOfMemoryError
). If possible, you can give more memory. Or you may consider redistribute the memory allocation for driver and executor, e.g., 1GB memory for driver and 3GB for executor. Please let me know if that helps!
Thanyk you for reply.
$ docker create --name movielens-data cloudsuite/movielens-dataset
$ docker run -dP --net host --name spark-master cloudsuite/spark:3.3.2 master
$ docker run -dP --net host --volumes-from movielens-data --name spark-worker-01 cloudsuite/spark:3.3.2 worker
spark://NODE_IP:7077
$ docker run --rm --net host --volumes-from movielens-data cloudsuite/in-memory-analytics /data/ml-latest
/data/myratings.csv --driver-memory 1g --executor-memory 3g --master spark://NODE_IP:7077
As you said, driver memory 1GB and executor memory 3GB were given, but unlike when 2GB was given, it appears that the resources are insufficient.
Conversely, even if 3GB of driver and 1GB of executor are given, it will not run.
If there is nothing that can be set, should it be regarded as insufficient physical memory?
Hello,
Thanks for doing the test. Indeed, this means the memory is not enough to run the workload.
There might be another way around: You can try to restrict the number of cores allocated to the container using --cpuset-cpus
. Memory consumption may be reduced when the worker count becomes smaller. The trade-off is that it will take longer time to finish.
Thank you for reply.
Are you saying that the problem is caused by the fact that the actual memory size is small even though there is a lot of data?
There was a problem when running by allocating 2GB, but since the result came out, can it be considered normal?
Yes. It is an implication that the physical memory is not enough. You can explain it as a normal case, but not a representative case.
Yes. It is an implication that the physical memory is not enough. You can explain it as a normal case, but not a representative case.
Thank you for reply.
I understand little bit.
The result came out, but you're saying that it's hard to see it normally because a memory error came out?
Hello,
Yes. Even though you finally have the result and workload successfully finished, my understanding is that it still cannot represent a real server: This workload is supposed to run on a server with large amount of memory, so you should not see any out-of-memory error during running.
However, it is OK if your ideal case is not a server :)
Best,
Hello,
Yes. Even though you finally have the result and workload successfully finished, my understanding is that it still cannot represent a real server: This workload is supposed to run on a server with large amount of memory, so you should not see any out-of-memory error during running.
However, it is OK if your ideal case is not a server :)
Best,
Thank you for your kind reply. You have been very helpful. Thank you