ros-visualization/rviz

Rviz Consumes all Memory and Crashes in Docker

mathewp88 opened this issue · 5 comments

Your environment

  • Fedora 38 and Endeavour OS (Docker)
  • ROS Distro: Noetic
  • RViz, Qt, OGRE, OpenGl version as printed by rviz:
    [ INFO] [1691399013.201012112]: rviz version 1.14.20
    [ INFO] [1691399013.201089313]: compiled against Qt version 5.12.8
    [ INFO] [1691399013.201111209]: compiled against OGRE version 1.9.0 (Ghadamon)
    

When I run rviz in docker, it uses all the system memory and crashes the computer. I limited the max memory usage of docker to 4gb and now it uses all 4gb and crashes with the following output:

QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
[ INFO] [1691399013.201012112]: rviz version 1.14.20
[ INFO] [1691399013.201089313]: compiled against Qt version 5.12.8
[ INFO] [1691399013.201111209]: compiled against OGRE version 1.9.0 (Ghadamon)
Killed

I have reproduced the issue on two separate Laptops on both noetic and melodic. No other applications seem to have similar behavior.

Running in gdb, gives the following output:

    Starting program: /opt/ros/noetic/bin/rviz
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    [New Thread 0x7fffee7ff700 (LWP 502)]
    QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
    [ INFO] [1691400227.945438217]: rviz version 1.14.20
    [ INFO] [1691400227.945513308]: compiled against Qt version 5.12.8
    [ INFO] [1691400227.945535778]: compiled against OGRE version 1.9.0 (Ghadamon)
    [Thread 0x7fffee7ff700 (LWP 502) exited]
    
    Program terminated with signal SIGKILL, Killed.
    The program no longer exists.

rviz is a highly extendable visualization framework. Its behavior is predominantly determined by the plugins running.
So, if you have (memory) issues, please first try to identify the offending plugin (e.g. adding them one by one only). Then report to the plugin provider.
If you are feeding rviz too fast with ROS messages, it may happen that these accumulate in memory. So, also check the overall load of your CPU as well as the frequency of your incoming messages. Does memory usage increase faster if you feed rviz at a higher frequency?
Is the problem specific to docker only, i.e. does it disappear in a native environment? If so, how exactly do you run rviz in docker, i.e. which method do you use to forward the X output to your native environment?

I ran rviz without any plugins, I just ran roscore and opened rviz. The issue is that even this leads to it crashing. I also tested it using a small project, where it displayed similar problems, both CPU and ram usage was around 100% before it crashed.
I just want to further specify that rviz cannot open at all. No window is created at all and no messages are being accumulated (at least to the point where it would fill 16gb of ram ) btw, the whole process of running rviz to ram being full takes at most 5 sec.
I tested all of this in a native environment, and everything works as expected so it seems that the issue is specific to docker.
I used several different methods to forward the X output, and all of them give the same issue. I have yet to try ssh forwarding and vnc, which I will test as soon as I can.

I tried the gui-docker script from MoveIt and this works perfectly for me.
In your case, rviz doesn't progress beyond this line:

RenderSystem::forceGlVersion(force_gl_version);

It would be very interesting where exactly rviz hangs. Could you build rviz with debug symbols and figure that out?

Yeah, I did a little bit of digging, and the whole thing seems to be a docker issue. dockerd seems to violently use memory, (around 10gb for something like roscore). Rviz wasn't hanging at all, docker was. Another thing to note is that this seems to occur on fedora and arch based systems but not on ubuntu. I ran the same docker container on a ubuntu focal system, and everything runs perfectly. I have no idea why this docker issue is present in arch and fedora though.

OK. I'm closing the issue here as this seems not related to rviz.