rapidquilt hang
Closed this issue · 5 comments
The rapidquilt process was hanging on milos this morning with the following backtraces:
Id Target Id Frame
* 1 Thread 0x7f41455b6080 (LWP 6335) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f4143dff700 (LWP 6336) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7f4143bfe700 (LWP 6337) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f41439fd700 (LWP 6338) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f41437fc700 (LWP 6339) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
-
Thread 1 (Thread 0x7f41455b6080 (LWP 6335))
#0 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556068060d19 in rayon_core::latch::LockLatch::wait () #2 0x0000556068091b01 in rayon_core::registry::Registry::in_worker_cold () #3 0x00005560680ab642 in rapidquilt::apply::parallel::apply_patches () #4 0x00005560680be51b in rapidquilt::main_result () #5 0x00005560680b9a3e in rapidquilt::main () #6 0x00005560680adc43 in std::rt::lang_start::{{closure}} () #7 0x00005560680b9698 in main ()
-
Thread 2 (Thread 0x7f4143dff700 (LWP 6336))
#0 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00005560680801f4 in std::thread::park () #2 0x00005560680b50c1 in std::sync::mpsc::sync::wait () #3 0x00005560680a797f in rapidquilt::apply::parallel::apply_worker_task () #4 0x0000556068093d83 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once () #5 0x000055606809b225 in <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute () #6 0x000055606805d3b6 in rayon_core::registry::WorkerThread::wait_until_cold () #7 0x00005560680938a0 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once () #8 0x000055606809bb45 in <rayon_core::job::StackJob<L, F, R> as rayon_core::job::Job>::execute () #9 0x000055606805d3b6 in rayon_core::registry::WorkerThread::wait_until_cold () #10 0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace () #11 0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box () #12 0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start () #13 0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f41448afe8d in clone () from /lib64/libc.so.6
-
Thread 3 (Thread 0x7f4143bfe700 (LWP 6337))
#0 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556068060435 in rayon_core::sleep::Sleep::sleep () #2 0x000055606805d49a in rayon_core::registry::WorkerThread::wait_until_cold () #3 0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace () #4 0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box () #5 0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start () #6 0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f41448afe8d in clone () from /lib64/libc.so.6
-
Thread 4 (Thread 0x7f41439fd700 (LWP 6338))
#0 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00005560680801f4 in std::thread::park () #2 0x0000556068099642 in <std::sync::mpsc::SyncSender<T>>::send () #3 0x00005560680a68bd in rapidquilt::apply::parallel::apply_worker_task () #4 0x0000556068093d83 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once () #5 0x000055606809b225 in <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute () #6 0x000055606805d3b6 in rayon_core::registry::WorkerThread::wait_until_cold () #7 0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace () #8 0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box () #9 0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start () #10 0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f41448afe8d in clone () from /lib64/libc.so.6
-
Thread 5 (Thread 0x7f41437fc700 (LWP 6339))
#0 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556068060435 in rayon_core::sleep::Sleep::sleep () #2 0x000055606805d49a in rayon_core::registry::WorkerThread::wait_until_cold () #3 0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace () #4 0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box () #5 0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start () #6 0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f41448afe8d in clone () from /lib64/libc.so.6
Note that the hang is not deterministic. I ran the same command again, and it finished:
kbuild@milos:~> rapidquilt push --dry-run -a -d /dev/shm/kbuild/linux.6295/job-0/linux-2.6.32 -p /dev/shm/kbuild/linux.6295/job-0/kernel-source -F0
Applying 2661 patches using 40 threads...
Patch patches.fixes/0001-KEYS-prevent-creating-a-different-user-s-keyrings.patch FAILED
File security/keys/internal.h FAILED
Hunk #1: FAILED @@ -137,7 +137,7 @@ extern key_ref_t keyring_search_aux(key_
hint: Comparison of the content of the file and the content expected by the hunk:
106: key_match_func_t match);
106: extern key_ref_t search_my_process_keyrings(struct keyring_search_context *ctx);
107:
108: extern key_ref_t search_process_keyrings(struct key_type *type,
108: extern key_ref_t search_process_keyrings(struct keyring_search_context *ctx);
109: const void *description,
110: key_match_func_t match,
111: const struct cred *cred);
112:
113: extern struct key *find_keyring_by_name(const char *name, bool skip_perm_check);
114:
115: extern int install_user_keyrings(void);
116: extern int install_thread_keyring_to_cred(struct cred *);
hint: Patch would apply on this file with fuzz 2
hint: No previous patches touched this file.
File security/keys/key.c FAILED
Hunk #1: FAILED @@ -302,6 +302,8 @@ struct key *key_alloc(struct key_type *t
hint: Comparison of the content of the file and the content expected by the hunk:
292: key->security = NULL;
292: key->flags |= 1 << KEY_FLAG_IN_QUOTA;
293:
294: if (!(flags & KEY_ALLOC_NOT_IN_QUOTA))
294: if (flags & KEY_ALLOC_TRUSTED)
295: key->flags |= 1 << KEY_FLAG_IN_QUOTA;
295: key->flags |= 1 << KEY_FLAG_TRUSTED;
296:
297: memset(&key->type_data, 0, sizeof(key->type_data));
298:
hint: Patch would apply on this file with fuzz 3
hint: 3 previous patches touched this file:
patches.fixes/0001-KEYS-close-race-between-key-lookup-and-freeing.patch
patches.fixes/0001-KEYS-Fix-race-between-key-destruction-and-finding-a-.patch
patches.fixes/0001-KEYS-Fix-crash-when-attempt-to-garbage-collect-an-un.patch
File include/linux/key.h FAILED
Hunk #1: FAILED @@ -170,6 +170,7 @@ struct key {
hint: Comparison of the content of the file and the content expected by the hunk:
154: #define KEY_FLAG_REVOKED 2 /* set if key had been revoked */
154: #define KEY_FLAG_INVALIDATED 7 /* set if key has been invalidated */
155: #define KEY_FLAG_IN_QUOTA 3 /* set if key consumes quota */
155: #define KEY_FLAG_TRUSTED 8 /* set if key is trusted */
156: #define KEY_FLAG_USER_CONSTRUCT 4 /* set if key is being constructed in userspace */
156: #define KEY_FLAG_TRUSTED_ONLY 9 /* set if keyring only accepts links to trusted keys */
157: #define KEY_FLAG_NEGATIVE 5 /* set if key is negative */
158:
159: /* the description string
159: /* the key type and key description string
160: * - this is used to match a key against search criteria
160: * - the desc is used to match a key against search criteria
Hunk #2: FAILED @@ -221,6 +222,7 @@ extern struct key *key_alloc(struct key_
hint: Comparison of the content of the file and the content expected by the hunk:
195: #define KEY_ALLOC_QUOTA_OVERRUN 0x0001 /* add to quota, permit even if overrun */
196: #define KEY_ALLOC_NOT_IN_QUOTA 0x0002 /* not in quota */
197:
197: #define KEY_ALLOC_TRUSTED 0x0004 /* Key should be flagged as trusted */
:
198: extern void key_revoke(struct key *key);
199: extern void key_put(struct key *key);
199: extern void key_invalidate(struct key *key);
hint: Patch would apply on this file with fuzz 3
hint: No previous patches touched this file.
File security/keys/process_keys.c FAILED
Hunk #1: FAILED @@ -76,7 +76,10 @@ int install_user_keyrings(void)
hint: Comparison of the content of the file and the content expected by the hunk:
75: if (IS_ERR(uid_keyring)) {
76: uid_keyring = keyring_alloc(buf, user->uid, (gid_t) -1,
76: uid_keyring = keyring_alloc(buf, user->uid, INVALID_GID,
77: cred, KEY_ALLOC_IN_QUOTA,
77: cred, user_keyring_perm,
78: NULL);
78: KEY_ALLOC_IN_QUOTA, NULL);
79: if (IS_ERR(uid_keyring)) {
80: ret = PTR_ERR(uid_keyring);
81: goto error;
Hunk #2: FAILED @@ -92,7 +95,9 @@ int install_user_keyrings(void)
hint: Comparison of the content of the file and the content expected by the hunk:
91: session_keyring =
92: keyring_alloc(buf, user->uid, (gid_t) -1,
92: keyring_alloc(buf, user->uid, INVALID_GID,
: cred, user_keyring_perm,
93: cred, KEY_ALLOC_IN_QUOTA, NULL);
93: KEY_ALLOC_IN_QUOTA, NULL);
94: if (IS_ERR(session_keyring)) {
95: ret = PTR_ERR(session_keyring);
96: goto error_release;
hint: Patch would not apply on this file with any fuzz
hint: 2 previous patches touched this file:
patches.kernel.org/patch-2.6.32.15-16
patches.fixes/0001-KEYS-fix-keyctl_set_reqkey_keyring-to-not-leak-threa.patch
File security/keys/keyring.c FAILED
Hunk #1: FAILED @@ -930,15 +930,15 @@ found:
hint: Comparison of the content of the file and the content expected by the hunk:
517: /*
518: * find a keyring with the specified name
518: * Find a keyring with the specified name.
: *
519: * - all named keyrings are searched
519: * All named keyrings in the current user namespace are searched, provided they
: * grant Search permission directly to the caller (unless this check is
: * skipped). Keyrings whose usage points have reached zero or who have been
: * revoked are skipped.
: *
520: * - normally only finds keyrings with search permission for the current process
520: * Returns a pointer to the keyring with the keyring's refcount having being
: * incremented on success. -ENOKEY is returned if a key could not be found.
521: */
522: struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
523: {
524: struct key *keyring;
525: int bucket;
Hunk #2: OK with offset -419 @@ -966,10 +966,15 @@ struct key *find_keyring_by_name(const c
hint: Patch would not apply on this file with any fuzz
hint: 2 previous patches touched this file:
patches.kernel.org/patch-2.6.32.15-16
patches.fixes/0001-keys-Guard-against-null-match-function-in-keyring_se.patch
FWIW I also saved a core dump (using GDB gcore
command) for potential later analysis. Not sure how useful it is, since the corresponding debuginfo packages were not installed and have been meanwhile rebuilt in the Build Service.
The hanging version was rapidquilt-0.6.0-1.1.x86_64
. The source files can be fetched from the Build Service:
osc co obs://build.opensuse.org/Kernel:tools/SLE_12_SP3/5586dcc80945d75e299c2ec4daaeab9c-rapidquilt
Note: the internal kernel-source repo has some branches that reproduce rapidquilt hangs. Some may be system-specific but some should be realiably reproducible, too.
git branch -a | grep rapid
I don't know how to get access to the relevant patches or to try to reproduce this, but looking at the implementation of src/rapidquilt/apply/parallel.rs
, I wonder if the issue is that the mpsc
pipe is limited to thread*2
messages. Each thread can send more than two messages if multiple FilePatch
es fail, leading to threads blocking.
Does something like this fix the issue (I'm not sure why sync_channel
was used originally, but mpsc::channel
doesn't have a baked-in limit of the number of messages in the queue):
diff --git a/src/rapidquilt/apply/parallel.rs b/src/rapidquilt/apply/parallel.rs
index b374f34d9612..c793f81442dc 100644
--- a/src/rapidquilt/apply/parallel.rs
+++ b/src/rapidquilt/apply/parallel.rs
@@ -459,7 +459,7 @@ pub fn apply_patches<'config, 'arena>(config: &'config ApplyConfig, arena: &'are
// Prepare channels to send messages between applying threads.
let (senders, receivers): (Vec<_>, Vec<_>) = (0..threads).map(|_| {
- mpsc::sync_channel::<Message>(threads * 2) // At the moment every thread can send at most 2 messages, so lets use fixed size channel.
+ mpsc::channel::<Message>()
}).unzip();
// Create barrier for synchronization
Also I've noticed that the project isn't rustfmt
ed -- I can send a patch to fix that if you like.
I don't know how to get access to the relevant patches or to try to reproduce this, but looking at the implementation of
src/rapidquilt/apply/parallel.rs
, I wonder if the issue is that thempsc
pipe is limited tothread*2
messages. Each thread can send more than two messages if multipleFilePatch
es fail, leading to threads blocking.
You are right. The limit of max two messages existed at first, because the thread used to stop applying after the first failure. So the thread would send at most one Message::NewEarliestBrokenPatchIndex
and then one Message::ThreadDoneApplying
. But later that was changed to make sure all FilePatch
es belonging to the same Patch
attempt to apply, so we know if more of them failed and can report it to the user.
Alternative solution would be to not call broadcast_message(Message::NewEarliestBrokenPatchIndex);
if the earliest_broken_patch_index
didn't change. I.e. if the FilePatch
that just failed belongs to the same Patch
as the previously failed FilePatch
.
The fix certainly helps quite a bit
branch | before | after |
---|---|---|
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup | LOCKUP | LOCKUP |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup2 | LOCKUP | LOCKUP |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup4 | LOCKUP | LOCKUP |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup5 | LOCKUP | LOCKUP |
rapidquilt-lockup | LOCKUP | OK |
rapidquilt-lockup2 | LOCKUP | OK |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup6-early | LOCKUP | OK |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup7-early | LOCKUP | OK |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup8-early | LOCKUP | OK |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup9-early | LOCKUP | OK |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockupA-early | LOCKUP | OK |
remotes/origin/users/msuchanek/SLE15/rapidquilt-lockup2 | LOCKUP | OK |
remotes/origin/users/msuchanek/fixes/linux-4.12/rapidquilt-lockup | LOCKUP | OK |
remotes/origin/users/msuchanek/fixes/linux-4.12/rapidquilt-lockup-2 | LOCKUP | OK |
remotes/origin/users/msuchanek/stable/rapidquilt-lockup | LOCKUP | OK |
rapidquilt-lockup3 | OK | OK |
remotes/origin/users/mbrugger/SLE15-SP1/rapid | OK | OK |
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup3 | OK | OK |
remotes/origin/users/msuchanek/SLE15-SP3/rapidquilt-EISDIR | OK | OK |
remotes/origin/users/msuchanek/SLE15/rapidquilt-lockup | OK | OK |