2.5 alpine thread stack size/level
Daniel-ltw opened this issue ยท 16 comments
https://bugs.ruby-lang.org/issues/14387#change-70914
This is for people who might encounter this issue.
Testcase:
# test.rb
n = 1000
res = {}
1.upto(n).to_a.inject(res) do |r, i|
r[i] = {}
end
def f(x)
x.each_value { |v| f(v) }
end
f(res)Problem is the way ruby calculate stacksize for main thread and what musl reports as available via the non-portable pthread_getattr_np() call. Musl will give you the memory size that is guaranteed by kernel while glibc gives you the number you will "hopefully" get.
@ncopa is there a config variable we could throw, during the make/compilation phase, to increase that size? I'm just wondering if this is a viable solution/workaround at this point.
No, that is sort of the problem. The stack size is big enough, (sort of) but it the way musl reports it. A possible workaround would be to implement ruby's own pthread_getattr_np() and use that when current thread (pthread_self()) is the same as main thread. (syscall(SYS_gettid) == getpid()) and fallback to libc's phread_getattr_np() when its not main thread.
The code there is already messy as it is and "fixing" it or adding a workaround in ruby will make it worse ๐ What ruby tries to do here is somewhat controversial in the first place. Kernel could still deny ruby the stack memory it believes it has available.
This is kind of frustrating as I have static code analysis failing on the stack size and I have fall back to ruby 2.4 for that.
The normal application test do run fine with 2.5.
Possible workaround:
diff --git a/thread_pthread.c b/thread_pthread.c
index 951885ffa0..e2d662143b 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -721,7 +721,7 @@ ruby_init_stack(volatile VALUE *addr
native_main_thread.register_stack_start = (VALUE*)bsp;
}
#endif
-#if MAINSTACKADDR_AVAILABLE
+#if MAINSTACKADDR_AVAILABLE && !(defined(__linux__) && !defined(__GLIBC__))
if (native_main_thread.stack_maxsize) return;
{
void* stackaddr;
@@ -1680,7 +1680,7 @@ ruby_stack_overflowed_p(const rb_thread_t *th, const void *addr)
#ifdef STACKADDR_AVAILABLE
if (get_stack(&base, &size) == 0) {
-# ifdef __APPLE__
+# if defined(__APPLE__) || (defined(__linux__) && !defined(__GLIBC__))
if (pthread_equal(th->thread_id, native_main_thread.id)) {
struct rlimit rlim;
if (getrlimit(RLIMIT_STACK, &rlim) == 0 && rlim.rlim_cur > size) {It may look like __APPLE__ systems have similar problem? I need to investigate why they do the check if (pthread_equal(th->thread_id, native_main_thread.id)). Thats the logic we need: if current thread is main thread, then use getrlimit(RLIMIT_STACK)
The workaround will omit the reserve_stack thing on linux. A proper fix based on http://www.openwall.com/lists/musl/2013/03/31/10:
diff --git a/thread_pthread.c b/thread_pthread.c
index 951885ffa0..d9814e789e 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -530,9 +530,6 @@ hpux_attr_getstackaddr(const pthread_attr_t *attr, void **addr)
# define MAINSTACKADDR_AVAILABLE 0
# endif
#endif
-#if MAINSTACKADDR_AVAILABLE && !defined(get_main_stack)
-# define get_main_stack(addr, size) get_stack(addr, size)
-#endif
#ifdef STACKADDR_AVAILABLE
/*
@@ -614,6 +611,54 @@ get_stack(void **addr, size_t *size)
return 0;
#undef CHECK_ERR
}
+
+#if defined(__linux__) && !defined(__GLIBC__) && defined(HAVE_GETRLIMIT)
+
+#ifndef PAGE_SIZE
+#include <unistd.h>
+#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
+#endif
+
+static int
+get_main_stack(void **addr, size_t *size)
+{
+ size_t start, end, limit, prevend = 0;
+ struct rlimit r;
+ FILE *f;
+ char buf[PATH_MAX+80], s[8];
+ int n;
+
+ f = fopen("/proc/self/maps", "re");
+ if (!f)
+ return -1;
+ n = 0;
+ while (fgets(buf, sizeof buf, f)) {
+ n = sscanf(buf, "%zx-%zx %*s %*s %*s %*s %7s", &start, &end, s);
+ if (n >= 2) {
+ if (n == 3 && strcmp(s, "[stack]") == 0)
+ break;
+ prevend = end;
+ }
+ n = 0;
+ }
+ fclose(f);
+ if (n == 0)
+ return -1;
+
+ limit = 100 << 20; /* 100MB stack limit */
+ if (getrlimit(RLIMIT_STACK, &r)==0 && r.rlim_cur < limit)
+ limit = r.rlim_cur & -PAGE_SIZE;
+ if (limit > end) limit = end;
+ if (prevend < end - limit) prevend = end - limit;
+ if (start > prevend) start = prevend;
+ *addr = (void *)end;
+ *size = end - start;
+ return 0;
+}
+#else
+# define get_main_stack(addr, size) get_stack(addr, size)
+#endif
+
#endif
static struct {Any updates on this? Will this be fixed upstream instead?
Thank you for working on this!
I did send the patch to https://bugs.ruby-lang.org/issues/14387#change-70914 but I haven't got any response. Maybe ask for response there?
The problem is not too complicated to understand, but understanding how to properly fix it is a bit complicated.
@ncopa Should we get this patch applied at this level for now as it will still affect ruby 2.5.0 and ruby 2.5.1?
This is so that downstream users get the fix for now and it will not affect other OS users from this level.
yes, i think this patch should be applied at this level.
As for, when the ruby lang team wants to apply the patch, let them ponder on that, as you have already release what you think is an appropriate patch.
I guess all contributor now have to be conscious that there is this patch for ruby 2.5 that is needed.
I'm still a little bit wary of applying the patch at this level, especially given that upstream doesn't seem keen on it. ๐
@tianon I guess this issue currently on affects alpine due to the use of musl c.
Upstream does not seem to be concern as majority of the users are on glibc which does not really affect them.
Based on @ncopa patch, this should work nicely with musl c as Yui Naruse said. As for other non glibc platforms, this might not be the right patch.
@ncopa https://bugs.ruby-lang.org/issues/14387#note-19
New reply in regards to your patch.
Is it possible to apply the patch from @ncopa within a Dockerfile using the ruby:2.5.1-alpine image? The larger debian images are compounding and making quite a difference in our CI builds.
When will this change get pushed to Docker Hub? I see here that the images on Docker Hub are still built using this commit from BEFORE the PR merge. The last time the official images were built was 11 days ago.