NVIDIA/libglvnd

v1.1.0 brakes Steam on bumblebee on Manjaro

michaldybczak opened this issue · 12 comments

After updating libglvnd to v1.1.0 Steam on intel works, bumblebee alone works, but when trying to run primusrun steam it segfaults:

 crash_20180815085810_1.dmp[5170]: Uploading dump (out-of-process)
/tmp/dumps/crash_20180815085810_1.dmp
/home/michaldybczak/.local/share/Steam/steam.sh: linia 876:  4947 Segefault   (memory dump)
$STEAM_DEBUGGER "$STEAMROOT/$STEAMEXEPATH" "$@"
crash_20180815085810_1.dmp[5170]: Finished uploading minidump (out-of-process): success = yes
crash_20180815085810_1.dmp[5170]: response: CrashID=bp-c100d0bc-33b3-40b2-91ea-ceb3b2180814
crash_20180815085810_1.dmp[5170]: file ''/tmp/dumps/crash_20180815085810_1.dmp'', upload yes: ''CrashID=bp-c10
0d0bc-33b3-40b2-91ea-ceb3b2180814''

Installing again v.1.0 of libglvnd and lib32-libglvnd fixes the issue. Here are the topics:

https://forum.manjaro.org/t/testing-update-2018-08-15-linux-4-19-libglvnd-python-haskell/55078/6

https://forum.manjaro.org/t/steam-games-stopped-working/55472

@michaldybczak -- How are you running Steam in this case? Are you setting LD_PRELOAD to override libstdc++.so.6, libgcc_s.so.1, and libxcb.so.1? Are you setting any other environment variables?

Okay, I've found several things that are likely to cause problems with primusrun and Steam.

primusrun works by setting LD_LIBRARY_PATH so that it can intercept calls to libGL.so.1, and redirect GLX calls to a secondary X server. On Arch and Manjaro, it sets LD_LIBRARY_PATH=/usr/$LIB/primus. The dynamic linker then expands the $LIB to either lib or lib64 depending on whether it's running a 32-bit or 64-bit executable. So far, so good.

But, Steam's startup scripts also muck with LD_LIBRARY_PATH. Steam will put a couple of its own directories first, then the default search directories, and then some more of its own directories.

To figure out the default search paths, it runs /sbin/ldconfig -XNv and parses the output. But, if LD_LIBRARY_PATH contains $LIB, then ldconfig segfaults and doesn't print anything. So, you're missing the normal system paths.

Without the system paths, you end up with Steam's own copies of a bunch of libraries that would otherwise come from the distro. This is the cause of the mismatch in Primus with libstdc++.so.6 -- it gets Steam's copy, but Primus was built against the version that the rest of Arch uses.

If you avoid that so that Primus's library can at least load, Primus will try to load the normal system libGL.so.1. In this case, that's libglvnd. Libglvnd then tries to load libGLX_mesa.so.0 for the regular (visible) X server. But, libGLX_mesa.so.0 fails to load, because of a mismatch with Steam's libxcb-dri3.so. Libglvnd is left without a vendor library for that screen, which eventually leads to calling glXQueryExtensionsString and getting NULL. Primus doesn't check for a NULL and segfaults when it tries to dereference it.

Now, if you instead set LD_LIBRARY_PATH=/usr/lib/primus:/usr/lib64/primus instead of relying on $LIB, then you avoid the ldconfig crash. That in turn avoids all of the library mismatches and resulting crashes.

However, if you do that, then running Steam through primusrun still won't work, because Steam puts the original LD_LIBRARY_PATH value at the end, after all of the default system paths. You'll end up with the normal libGL.so.1 instead of Primus, and so primusrun doesn't do anything. That's arguably a bug in Steam.

On top of all of that, Manjaro's NVIDIA driver packages include additional copies of the libglvnd libraries. I haven't figured out yet if those are the ones from the .run installer, but if so, they could also lead to conflicts with the distro's normal libglvnd libraries.

The extra libglvnd libraries in the NVIDIA package might actually make a difference between libglvnd 1.0.0 and 1.1.0. But, I can't find any setup where one of the other problems wouldn't break it first.

I am happy that you looked at this so closely!
I'm not technical enough to understand everything but you described them in so incredibly detailed and clear manner, that I could comprehend about half of it, which is huge, believe me ;P.

Manjaro has also steam-native package which I use since it bypasses many arch related runtime issues and I did have one before. I had to launch steam with a command that excludes one of the manjaro's libraries, but then it got broken after some update so instead of figuring out how to fix it (I wouldn't know where to start), I used steam-native package and all was good.
From what I read, this issue was typical for some old Steam install (from 2 years ago) or maybe to System install (???) and was fixed for new ones but persistent on given install (Steam or system ones, hard to put it to test). However, I don't think that was this is important since new users with lately installed Steam or system also reported this libglvnd issue.
Anyway, without bumblebee, Steam launches fine on new libglvnd, but that may be thanks to steam-native. On the other hand, other users seemed not to know that package and also can run Steam fine without it when primusrun is not used so maybe steam-native has nothing to do with it.

At the moment I keep previous libglvnd versions to make it work hassle-free as I want.

Since simple downgrade was screaming about dependency breakages, I went back to old libglvnd with this command:

sudo pacman -U https://archive.archlinux.org/packages/l/libglvnd/libglvnd-1.0.0-1-x86_64.pkg.tar.xz https://archive.archlinux.org/packages/l/lib32-libglvnd/lib32-libglvnd-1.0.0-1-x86_64.pkg.tar.xz

Others also found this (temporal) solution as fixing the entire issue.

As you may have noticed in further responses, Manjaro modified its steam-manjaro package somehow after this and suggested not running Steam with bumblebee but instead use applications launch options for each game:

DRI_PRIME=1 %command%
This is a valid workaround but I'm not happy with it. I prefer to modify my steam launcher and forget about it. I should be able to choose freely which apps will run with Nvidia and now I cannot do that with Steam, at least with newer libglvnd.

Also, when we come across those rare Steam system checks, our system will show as Linux with Intel GPU instead Nvidia one which messes Linux stats ;). This is, of course, a very small thing but we like to present Linux from a good perspective nonetheless :P.

Anyway, those are the reasons why I'm not particularly happy with the current state. I can't hold on to libglvnd 1.0.0-1 indefinitely.

I can provide you with some tests if you want to look into this, but you need to post me the commands as my terminal skills are basic. So I can do tests on libglvnd 1.0.0-1 (and its 32 version) and then upgrade it to new ones, do tests again and if needed, revert back to libglvnd 1.0.0-1 for the time being.
Again, old libglvnd works fine.

Unfortunately, Arch still doesn't have debug symbols, although it was promised a while ago :(. Not sure why the holdup.

Maybe some further steam or another package update will straight this issue? Some things resolve itself when waited long enough ;). Still, it's not always good to sit idle and then complain ;P.

I found my old command that I needed to run Steam without steam-native package:

LD_PRELOAD='/usr/$LIB/libstdc++.so.6 /usr/$LIB/libgcc_s.so.1 /usr/$LIB/libxcb.so.1 /usr/$LIB/libgpg-error.so' steam

So it looks like you are right.

The steam-native package does avoid all of the LD_LIBRARY_PATH problems I found. With those out of the way, I'm pretty sure I know what the problem is.

@michaldybczak -- Can you try running this and see if it avoids the crash?

PRIMUS_libGLa=/usr/lib32/libGL.so.1 primusrun steam

Bingo! Using this command Steam starts with bumblebee correctly with updated libglvnd! :D

Now I can incorporate it into my launcher. This is a workaround I can live with :).

So now what? Just waiting for some further Steam or other updates that may fix it or is there any fix on libglvnd level? I guess that you must think about a bigger picture then just Manjaro so I'm not sure if there is anything to do?

The problem is that Manjaro's nvidia-utils package includes mismatched copies of a couple of the libglvnd libraries, and primusrun ends up trying to use a combination of them.

I sent out a fix in manjaro/packages-extra#175.

Awesome! I linked this topic to philm as well so I think this will be sorted out eventually and till then I have a workaround. It's a better solution than keeping old libglvnd packages.

@kbrenneman: thx for the fix. Accordingly I also patched nvidia-390xx driver series. lib32, x86_64

This same problem appears to reappear with version 1.3 of libglvnd.

How should you proceed to solve it?

Hi @zimudec . Development of libglvnd has moved to freedesktop.org. Please file an issue there if you're experiencing a problem.

@aaronp24 thanks!

I will do