chromium build issues
Opened this issue · 23 comments
Trying out the chromium patches, in the tcmalloc third party library I hit some usual sys/types.h
silliness, followed by a mmap()
redefinition. I can fix things up easy enough, but I am thinking I have probably dropped a stitch along the way in getting the environment right. I am using musl-1.1.15-r2.ebuild from the overlay. Any guesses on what I might have missed is appreciated.
[edit] It looks like I have something set up wrong for sure, because HAVE_SYS_CDEFS_H is ending up defined somehow, and of course cdefs.h which doesn't exist on musl also fails.
In file included from ../../third_party/tcmalloc/chromium/src/malloc_hook_mmap_linux.h:51:0,
from ../../third_party/tcmalloc/chromium/src/malloc_hook.cc:698:
../../third_party/tcmalloc/chromium/src/base/linux_syscall_support.h:1932:37: error: ‘__off64_t’ has not been declared
[...]
malloc_hook_mmap_linux.h:194:18: error: redefinition of ‘void* mmap(void*, size_t, int, int, int, off_t)’
extern "C" void* mmap(void *start, size_t length, int prot, int flags,
^
In file included from ../../third_party/tcmalloc/chromium/src/malloc_hook.cc:40:0:
../../third_party/tcmalloc/chromium/src/malloc_hook_mmap_linux.h:180:18: note: ‘void* mmap(void*, size_t, int, int, int, __off64_t)’ previously defined here
extern "C" void* mmap64(void *start, size_t length, int prot, int flags,
^
http://www.openwall.com/lists/musl/2014/08/08/11
As far as I know, musl still doesn't support alternative malloc implementations, so you have to disable tcmalloc.
... which is why i didn't look into a patch to fix it.
Appreciate the link; that explains everything. I naively thought it would build out of the box like firefox. Go ahead and close this issue out if you like.
Great patches by the way. I hope some of this work eventually drifts all the way upstream.
They're actually mostly the same as what voidlinux and alpinelinux have in their patchsets -- I haven't started opening issues on the chromium side because I can't get chromium to run yet. There's some kind of segmentation fault going on which I think is still related to threading.
Okay thanks for the tip. I might install Alpine just to take a look at what they've done. I gather chromium runs (to some reasonable degree) on Alpine, including whatever dependent packages they might have patched differently? Or when you say "open issues on the chromium side" are you implying the thread related problem is (or could be) more fundamental? Chromium is typically statically linked, so with possible exception of some externalities like dbus and namespaces in kernel for sandbox, it shouldn't much care what musl distro it's running on. Is that about right?
I'm assuming that chromium runs on Voidlinux and Alpinelinux, but I'm also not 100% sure -- the main reason for that being that I'm using a similar patchset and it doesn't work, and they only recently added a patch (https://github.com/lluixhi/musl-extras/blob/master/www-client/chromium/files/musl/06_all_fix-stack.patch#L1-L12) that's critical in order to keep chromium from crashing after loading any page. (They added it as of chromium 53, but it has been an issue for a much longer period of time -- at least since chromium 45, probably earlier)
The above problem is because musl makes the default thread stack size 80KiB (which is a good default in 99% of cases), while glibc sets the default thread stack size to 1MiB, which appears to be convention in solaris, OSX, and other libc implementations, but because chromium (as well as webkit) assumes the 1MiB restriction, we run out of stack space on new threads. This is also a problem in webkit, where i have a patch which prevents JavascriptCore from crashing, but neither Alpine nor Void have picked it up yet, so I can only assume that it's broken on their end.. (https://github.com/lluixhi/musl-extras/blob/master/dev-qt/qtwebkit/files/qtwebkit-5.5.1-fix-stack-size-musl.patch)
Anyway, I think the issue exists somewhere between chromium 45 and 49, because qtwebengine-5.6.x (based off of chromium 45) works but qtwebengine-5.7.x (based off of chromium 49) does not.
One thing that changed was that the seccomp-bpf syscall sandboxing was enabled or made stricter, and because musl uses different syscalls as compared to glibc, without https://github.com/lluixhi/musl-extras/blob/master/dev-qt/qtwebengine/files/qtwebengine-5.7.0-musl-sandbox.patch
we segmentation fault because we use incompatible syscalls or syscalls with incompatible options.
There seems to be some other issue regarding threading (last time I checked we still crashed during pthread_clone) but it's kind of difficult to debug because chromium mixes green threads with posix threads and multiple processes (chrome zygote), and it crashes before I can connect gdb.
What i mean by opening issues on the chromium side is that I'm not comfortable submitting patches upstream to the chromium project if they don't work. I also need to build chromium with the patchset on glibc to make sure I don't break anything there.
Then there's the issue that GN, the new build system for chromium that replaces GYP in chromium 54+, needs to be patched because they use a memory allocator hack that's incompatible with musl.
And yes, because it's statically linked, chromium should run about the same on different musl-based linux distros. Alpine will probably be a better test because they also use grsecurity patches in their kernel.
IIRC @ncopa has great experience with solving Chromium issues on Alpine (musl), let's ask him.
I've got Alpine up and their binary seems stable enough in a short test. YouTube videos play anyway, They are on 53.0.2785.143 currently. The thread stack problem has a small patch, and 53 is still on GYP. I'll look out for your sandbox issue and much appreciate the heads up.
Alright, so the current segmentation fault in qtwebengine is because of a SIGILL ILLOPN. I wonder whether this could be fixed by using an older version of sys-devel/gcc (I'm using gcc-6.2.0 right now, and there might be some kind of codegen bug.)
@lluixhi If you're still trapped in chromium's segfault, i'm sure that you need this to get it worked(maybe qtwebengine need this, too):
https://git.archlinux.org/svntogit/packages.git/tree/trunk/chromium-52.0.2743.116-unset-madv_free.patch?h=packages/chromium
I'm running chromium(53.0.2785.143) well with your patches, with this patch and stack_size patch.(compiled by gcc620)
EDIT: There's a dirty fix patch for ppapi plugins: https://raw.githubusercontent.com/xhebox/noname-linux/master/ports/chromium/musl-ppapi-nosandbox-and-fixdlopen.patch. I first made it because i want to use ppapi flash under musl. Then i found that all ppapi plugins(which need to use dlopen to load) will be blocked by sandbox. The fisrt part of this patch(before 'flash' appeared) should solve this, the second part is for closed source flash not for this issue.
@xhebox Thanks!
qtwebengine appears to be working now, though I'm going to do some more testing.
Alright, seems not everything is quite right, at least in qtwebengine. Im still getting segmentation faults on some web pages. For instance, when trying to open the chat on GMail/Hangouts, and in some other cases.
I think that this is possibly a JavaScript issue.
Did you apply this? I mean this cflag.
# Work around bug in blink in which GCC 6 optimizes away null pointer checks
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833524
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68853#c2
sed -i '/config("compiler")/ a cflags_cc = [ "-fno-delete-null-pointer-checks" ]' \
build/config/linux/BUILD.gn
Locally, yes.
That's an issue that's not musl specific, though.
When using chromium, i got segfaults, either. But no effects to use.
Received signal 11 SEGV_MAPERR 000000000208
r8: 0000562574cae5a8 r9: 0000000000004375 r10: 0000562574938898 r11: 0000000000000000
r12: 00005625780665e0 r13: 0000562578067ce0 r14: 0000000000000000 r15: 0000562578064738
di: 0000562578072288 si: 0000000000000032 bp: 0000562578068160 bx: 0000562578072288
dx: 0000000000000001 ax: 0000000000000000 cx: 0000000000000000 sp: 00007ffe0f1984c0
ip: 00007f19bb00ccef efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]
Also i get these two:
getrlimit(RLIMIT_NOFILE) failed
[1:14:1117/213721:ERROR:ffmpeg_demuxer.cc(1492)] OnReadFrameDone result=-541478725 IsMaxMemoryUsageReached=0
@xhebox
It seems to me that qtwebengine-5.7.0 is stable as long as you're using ffmpeg-2.x -- when using ffmpeg-3.x, there are segmentation faults when the page has video -- I think this is another non-musl bug.
@lluixhi I'll take a look on it, and good news, i got 540 compiled and worked.
The main changes are:
-
pthread_setname_np is added in musl, since i'm using the patchset of alpine. So i removed the stub patch of chromium. More here
-
One RTLD flags which musl does not support, here.
-
Most important, allocator_shim.patch. Make all Glibc*** directly invoke the real malloc instead of __libc***, so we can compile it. Remove the overrides of malloc, so it won't be a dead loop(Glibc*** -> alloc -> Glibc).
-
And... I found it can't found the correct path for pkg-config to output, always
-I../../include/glib-2.0(Should be /include/***)
. I've got a xhebox.patch, and using sed to correct it. But i think this is my own issue, maybe you can try 540 and find out more about this(whether you will get the same result)? Thx.
Detailed build file here
Thanks! Some comments:
- We won't actually need the pthread-setname_np patch because https://github.com/lluixhi/musl-extras/blob/master/www-client/chromium/files/musl/09_all_no-pthread-setname.patch is actually from upstream. I think we can just wait until musl-1.1.16
- I'll probably modify that to instead define RTLD_DEEPBIND to 0, which is what alpine and voidlinux appear to be doing in audacity and docker (it's also more portable).
- Yeah, I was attempting to make a similar patch and didn't get around to it. Thanks!
- Hmm. I'll look into that.
UPDATE: CONFIRMED
--- chromium-54.0.2840.100/content/common/sandbox_linux/bpf_gpu_policy_linux.cc 2016-11-10 20:02:14.000000000 +0000
+++ chromium-54.0.2840.100/content/common/sandbox_linux/bpf_gpu_policy_linux.cc 2016-11-10 20:02:14.000000000 +0000
@@ -337,6 +337,7 @@
static const char kNvidiaParamsPath[] = "/proc/driver/nvidia/params";
static const char kDevShm[] = "/dev/shm/";
+ static const char kDevShm2[] = "/run/shm/";
CHECK(broker_process_ == NULL);
@@ -349,6 +350,8 @@
// For shared memory.
permissions.push_back(
BrokerFilePermission::ReadWriteCreateUnlinkRecursive(kDevShm));
+ permissions.push_back(
+ BrokerFilePermission::ReadWriteCreateUnlinkRecursive(kDevShm2));
// For multi-card DRI setups. NOTE: /dev/dri/card0 was already added above.
for (int i = 1; i <= 9; ++i) {
permissions.push_back(BrokerFilePermission::ReadWrite(
I've got the segfaults clearly, chromium failed to launch GPU process. Syscall trace there, it tried to open /run/shm, but it's not in the whitelist of sandbox. I've made a patch, testing. I want to know if your chromium open /run/shm, too? This could be a portability-patch for systems that let chromium open /run instead of /dev.
14:36:26.504754 memfd_create("xshmfence", MFD_CLOEXEC|MFD_ALLOW_SEALING) = -1 EPERM (Operation not permitted)
14:36:26.504780 open("/run/shm/shmfd-apMfAa", O_RDWR|O_CREAT|O_EXCL, 0600) = 2
14:36:26.504806 --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_errno=ENOENT, si_call_addr=0x7fc9f59711bc, si_syscall=__NR_open, si_arch=AUDIT_ARCH_X86_64} ---
14:36:26.504822 rt_sigreturn({mask=[]}) = -1 EPERM (Operation not permitted)
Received signal 11 SEGV_MAPERR 000000000208
r8: 0000563d8c3c9fc8 r9: 00000000000061f9 r10: 0000563d8c209860 r11: 0000000000000005
r12: 0000563d8c584d00 r13: 0000563d8c5846c0 r14: 0000000000000000 r15: 0000563d8c5812a0
di: 0000563d8c597028 si: 0000000000000032 bp: 0000563d8c584dc0 bx: 0000563d8c597028
dx: 0000000000000001 ax: 0000000000000000 cx: 0000000000000000 sp: 00007ffde724e1f0
ip: 00007efc5f1fb94f efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]
Received signal 11 SEGV_MAPERR 000000000208
r8: 00005635db287948 r9: 0000000000006347 r10: 00005635db0d5860 r11: 000000000000000f
r12: 00005635dbe816a0 r13: 00005635dc29ca00 r14: 0000000000000000 r15: 00005635dc2a17c0
di: 00005635dc2a2fa8 si: 0000000000000032 bp: 00005635dc1ebec0 bx: 00005635dc2a2fa8
dx: 0000000000000001 ax: 0000000000000000 cx: 0000000000000000 sp: 00007fff0c1f3a10
ip: 00007fb9947ce94f efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]
Received signal 11 SEGV_MAPERR 000000000208
r8: 000055e1559dd948 r9: 0000000000006430 r10: 000055e15582b860 r11: 000000000000000f
r12: 000055e1564ea460 r13: 000055e1564e9d80 r14: 0000000000000000 r15: 000055e1564e7b80
di: 000055e1567d2028 si: 0000000000000032 bp: 000055e1564ea520 bx: 000055e1567d2028
dx: 0000000000000001 ax: 0000000000000000 cx: 0000000000000000 sp: 00007ffeface4db0
ip: 00007f66d168094f efl: 0000000000010246 cgf: 002b000000000033 erf: 0000000000000004
trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000208
[end of stack trace]
[24954:25042:1127/172614:ERROR:browser_gpu_channel_host_factory.cc(113)] Failed to launch GPU process.
getrlimit(RLIMIT_NOFILE) failed
[24954:25042:1127/172614:ERROR:browser_gpu_channel_host_factory.cc(113)] Failed to launch GPU process.
Found ffmpeg error in three bug reports in chromium, but seems it did not attract developer's attention. Maybe this is an 'excepted' error for chromium?
About the ffmpeg bug:
I fixed it with https://github.com/lluixhi/gentoo/blob/5a2e2775c6ebcab54d8cb88700bc585eed7853f1/dev-qt/qtwebengine/files/qtwebengine-5.7.0-fix-system-ffmpeg.patch
in qtwebengine, and it's fixed by the chromium-system-ffmpeg patches already in Gentoo.
I don't appear to have the /dev/shm /run/shm issue you're mentioning. I'm going to try chromium without a hardened kernel next to see if PaX is disabling something that is not covered by paxmarking.
Strange, i'm using the same patch for ffmpeg....Maybe it's because of init's bug i solved yesterday. And thx for your feedbacks about shm, configure.ac said that xorg-server fallbacks by default to /run when building without specific configure.
Can you please give chromium a revbump? I would like to give it another try :)
@stefson @lluixhi chromium needs a fix patch with sandbox on 60 version:
--- ./sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc.orig
+++ ./sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc
@@ -373,6 +373,7 @@
#if defined(__i386__)
case __NR_waitpid:
#endif
+ case __NR_set_tid_address:
return true;
case __NR_clone: // Should be parameter-restricted.
case __NR_setns: // Privileged.
@@ -385,7 +386,6 @@
#if defined(__i386__) || defined(__x86_64__) || defined(__mips__)
case __NR_set_thread_area:
#endif
- case __NR_set_tid_address:
case __NR_unshare:
#if !defined(__mips__) && !defined(__aarch64__)
case __NR_vfork:
if who needs patches and configurations that could pass the build directly: https://github.com/xhebox/noname-linux/blob/master/ports/chromium/Pkgfile
be careful
- gn_fix_for_noname.patch is for my system only, skip that patch
- vaapi is not a essential patch