ros-visualization/rviz

RViz black screen during launch

tim-gros opened this issue · 11 comments

During startup of rviz, rviz may show only a black screen. No error occurs and after some time (~2-3min), the GUI recovers and runs as normal. During the black screen time, some CPU cores show a load of 100%. Meanwhile the other ROS nodes run fine.
This happens after certain modifications, such as:

  • removing a custom panel, saving the settings on exit and then starting rviz again
  • retrieve parameter from parameter-server at the beginning of the constructor of a custom panel. If parameter is retrieved a few lines later, rviz does not become black
  • adjusting the text in a Qt window

When reverting the modifications, rviz runs again without high CPU load and no black screen. Overall, it seems as if the black screen occurs for random modifications.
We could reproduce this issue on multiple computers. However, when packaging a "failing" rviz modification, the packaged version then runs without problems.

Screenshot from 2022-05-23 14-13-53

Your environment

  • OS Version: Ubuntu 20.04
  • ROS Distro: Noetic
  • RViz, Qt, OGRE, OpenGl version as printed by rviz:
[ INFO] [1653306171.303862674]: rviz version 1.14.14
[ INFO] [1653306171.303949889]: compiled against Qt version 5.12.8
[ INFO] [1653306171.303961123]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1653306171.309736794]: Forcing OpenGl version 0.
[ INFO] [1653306171.720886137]: Stereo is NOT SUPPORTED
[ INFO] [1653306171.720934235]: OpenGL device: Mesa Intel(R) UHD Graphics 630 (CML GT2)
[ INFO] [1653306171.720968223]: OpenGl version: 4.6 (GLSL 4.6) limited to GLSL 1.4 on Mesa system.
  • Graphics card:
Extended renderer info (GLX_MESA_query_renderer):
![Screenshot from 2022-05-23 14-13-53](https://user-images.githubusercontent.com/91877544/169817462-b7be8ded-1869-46c1-9d43-4151e0cc9d61.png)

    Vendor: Intel (0x8086)
    Device: Mesa Intel(R) UHD Graphics 630 (CML GT2) (0x9bc8)
    Version: 21.2.6
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.2.6
OpenGL core profile shading language version string: 4.60
OpenGL version string: 4.6 (Compatibility Profile) Mesa 21.2.6
OpenGL shading language version string: 4.60
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.2.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
    GL_EXT_shader_implicit_conversions, GL_EXT_shader_integer_mix

From your description, I understand that the undesired behavior stems from a custom panel. Did you introduce any race conditions making rviz wait for some event?
What exactly do you mean by packaging a failing rviz modification? If that works, you should search for differences between your test build and the "packaged build". What about debug vs. release build? Uninitialized variables?

Thanks a lot for your quick reply!
Indeed, the modification was made in a custom panel, but it seems kind of random where the modification is made. We noticed that after rviz has recovered from the black screen, a similar thing happens when dragging a panel to a different position. Rviz freezes for a short time and becomes unresponsive but works again afterwards.

We think the strange thing is that no error message is shown and that the black screen disappears after a few minutes and everything looks normal again.
Also, it seems as if rviz is running in the background, but only the rendering/GUI is not properly shown. We added several messages which are printed to the terminal as expected while the screen is black

packaging a failing rviz modification = creating a catkin package from code which produces this issue when only doing a catkin build

Random behaviour is often the consequence of uninitialized variables. Also, using different versions of a library within the same process results in such errors. I still don't get what you mean by "packaging a failing rviz modification".
I guess your plugin defining the custom panel is part of some catkin package in your workspace, which - of course - is build with catkin build. What's the difference in this normal usage and your (working) packaging configuration?

Thank you very much for the hints. I have not yet found any uninitialized variables, but will continue looking for them. I am checking the libraries as well, but did not find conficting versions so far.

I suppose my explanation regarding the package was a bit unclear. What I meant to say was that we build a binary .deb file with catkin and CPack, to which I referred as package. This binary package is then installed on an identical setup. I mainly mentioned this to epathize the somewhat randomness of the issue. Sorry for the confusion.

If the strange behavior is not reproducible from a generated .deb (vs. your catkin workspace), maybe your workspace is screwed. Try to remove devel, build, and log and start over?

I completely cleaned the workspace and initialized it from scratch. However, this did not yield any improvement. Also, other ROS-nodes run without any problems.
I think the really strange thing is, that it occurs as soon as the GUI is started or the GUI is changed, such as dragging a panel or adjusting the size of the RViz window if not in full screen. Unfortunately, I have still not figured out yet what truly causes this behaviour

You observe this strange behavior only with your custom plugins loaded, don't you?

Yes, I was not able to reproduce it with the basic rviz. However, I have just found that if I launch the basic rviz and then add all panels manually in the GUI (top left corner Panels->Add New Panel) and then save this configuration, rviz launches fine afterwards. So with this new default.rviz file it works fine, but the old one, which only differs in the order of the parameter, does not. Is there any specific order of the parameters in the .rviz file needed, that could cause the mentioned behaviour if not followed?

No, the order of panels and displays in the config file shouldn't matter. I think you should first identify the offending panel plugin, e.g. by removing panels one by one from your broken config. If you identified the offender, you can dig deeper into its code.

Update: I think we finally found the cause for this behavior. In one of the custom panels, there was a paint event from a Qt-Widget that was executed to often in some cases. This was responsible for the high CPU-usage and slowed down rviz so much that it just showed a black screen.
Thanks @rhaschke for your help and tips.

Great that you found the culprit.