openSUSE/rapidquilt

rapidquilt hang

Closed this issue · 5 comments

The rapidquilt process was hanging on milos this morning with the following backtraces:

  Id   Target Id                                     Frame
* 1    Thread 0x7f41455b6080 (LWP 6335) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7f4143dff700 (LWP 6336) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7f4143bfe700 (LWP 6337) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7f41439fd700 (LWP 6338) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7f41437fc700 (LWP 6339) "rapidquilt" 0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  • Thread 1 (Thread 0x7f41455b6080 (LWP 6335))

    #0  0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x0000556068060d19 in rayon_core::latch::LockLatch::wait ()
    #2  0x0000556068091b01 in rayon_core::registry::Registry::in_worker_cold ()
    #3  0x00005560680ab642 in rapidquilt::apply::parallel::apply_patches ()
    #4  0x00005560680be51b in rapidquilt::main_result ()
    #5  0x00005560680b9a3e in rapidquilt::main ()
    #6  0x00005560680adc43 in std::rt::lang_start::{{closure}} ()
    #7  0x00005560680b9698 in main ()
    
  • Thread 2 (Thread 0x7f4143dff700 (LWP 6336))

    #0  0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x00005560680801f4 in std::thread::park ()
    #2  0x00005560680b50c1 in std::sync::mpsc::sync::wait ()
    #3  0x00005560680a797f in rapidquilt::apply::parallel::apply_worker_task ()
    #4  0x0000556068093d83 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
    #5  0x000055606809b225 in <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute ()
    #6  0x000055606805d3b6 in rayon_core::registry::WorkerThread::wait_until_cold ()
    #7  0x00005560680938a0 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
    #8  0x000055606809bb45 in <rayon_core::job::StackJob<L, F, R> as rayon_core::job::Job>::execute ()
    #9  0x000055606805d3b6 in rayon_core::registry::WorkerThread::wait_until_cold ()
    #10 0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    #11 0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box ()
    #12 0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start ()
    #13 0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0
    #14 0x00007f41448afe8d in clone () from /lib64/libc.so.6
    
  • Thread 3 (Thread 0x7f4143bfe700 (LWP 6337))

    #0  0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x0000556068060435 in rayon_core::sleep::Sleep::sleep ()
    #2  0x000055606805d49a in rayon_core::registry::WorkerThread::wait_until_cold ()
    #3  0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    #4  0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box ()
    #5  0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start ()
    #6  0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0
    #7  0x00007f41448afe8d in clone () from /lib64/libc.so.6
    
  • Thread 4 (Thread 0x7f41439fd700 (LWP 6338))

    #0  0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x00005560680801f4 in std::thread::park ()
    #2  0x0000556068099642 in <std::sync::mpsc::SyncSender<T>>::send ()
    #3  0x00005560680a68bd in rapidquilt::apply::parallel::apply_worker_task ()
    #4  0x0000556068093d83 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
    #5  0x000055606809b225 in <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute ()
    #6  0x000055606805d3b6 in rayon_core::registry::WorkerThread::wait_until_cold ()
    #7  0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    #8  0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box ()
    #9  0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start ()
    #10 0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0
    #11 0x00007f41448afe8d in clone () from /lib64/libc.so.6
    
  • Thread 5 (Thread 0x7f41437fc700 (LWP 6339))

    #0  0x00007f41451990ff in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    #1  0x0000556068060435 in rayon_core::sleep::Sleep::sleep ()
    #2  0x000055606805d49a in rayon_core::registry::WorkerThread::wait_until_cold ()
    #3  0x000055606805fb9b in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    #4  0x000055606805d860 in <F as alloc::boxed::FnBox<A>>::call_box ()
    #5  0x000055606807a03e in std::sys::unix::thread::Thread::new::thread_start ()
    #6  0x00007f4145194724 in start_thread () from /lib64/libpthread.so.0
    #7  0x00007f41448afe8d in clone () from /lib64/libc.so.6
    

Note that the hang is not deterministic. I ran the same command again, and it finished:

kbuild@milos:~> rapidquilt push --dry-run -a -d /dev/shm/kbuild/linux.6295/job-0/linux-2.6.32 -p /dev/shm/kbuild/linux.6295/job-0/kernel-source -F0
Applying 2661 patches using 40 threads...
Patch patches.fixes/0001-KEYS-prevent-creating-a-different-user-s-keyrings.patch FAILED
  File security/keys/internal.h FAILED
    Hunk #1: FAILED     @@ -137,7 +137,7 @@ extern key_ref_t keyring_search_aux(key_

      hint: Comparison of the content of the file and the content expected by the hunk:

        106:                                key_match_func_t match);
        106: extern key_ref_t search_my_process_keyrings(struct keyring_search_context *ctx);
        107: 
        108: extern key_ref_t search_process_keyrings(struct key_type *type,                                                                                           
        108: extern key_ref_t search_process_keyrings(struct keyring_search_context *ctx);
        109:                                     const void *description,
        110:                                     key_match_func_t match,                                                                                               
        111:                                     const struct cred *cred);                                                                                             
        112: 
        113: extern struct key *find_keyring_by_name(const char *name, bool skip_perm_check);                                                                          
        114:                                                                                                                                                           
        115: extern int install_user_keyrings(void);                                                                                                                   
        116: extern int install_thread_keyring_to_cred(struct cred *);                                                                                                 


    hint: Patch would apply on this file with fuzz 2

    hint: No previous patches touched this file.

  File security/keys/key.c FAILED
    Hunk #1: FAILED     @@ -302,6 +302,8 @@ struct key *key_alloc(struct key_type *t

      hint: Comparison of the content of the file and the content expected by the hunk:

        292:    key->security = NULL;
        292:            key->flags |= 1 << KEY_FLAG_IN_QUOTA;
        293: 
        294:    if (!(flags & KEY_ALLOC_NOT_IN_QUOTA))                                                                                                                 
        294:    if (flags & KEY_ALLOC_TRUSTED)
        295:            key->flags |= 1 << KEY_FLAG_IN_QUOTA;
        295:            key->flags |= 1 << KEY_FLAG_TRUSTED;
        296: 
        297:    memset(&key->type_data, 0, sizeof(key->type_data));                                                                                                    
        298:                                                                                                                                                           


    hint: Patch would apply on this file with fuzz 3

    hint: 3 previous patches touched this file:
      patches.fixes/0001-KEYS-close-race-between-key-lookup-and-freeing.patch
      patches.fixes/0001-KEYS-Fix-race-between-key-destruction-and-finding-a-.patch
      patches.fixes/0001-KEYS-Fix-crash-when-attempt-to-garbage-collect-an-un.patch

  File include/linux/key.h FAILED
    Hunk #1: FAILED     @@ -170,6 +170,7 @@ struct key {

      hint: Comparison of the content of the file and the content expected by the hunk:

        154: #define KEY_FLAG_REVOKED   2       /* set if key had been revoked */
        154: #define KEY_FLAG_INVALIDATED       7       /* set if key has been invalidated */
        155: #define KEY_FLAG_IN_QUOTA  3       /* set if key consumes quota */
        155: #define KEY_FLAG_TRUSTED   8       /* set if key is trusted */
        156: #define KEY_FLAG_USER_CONSTRUCT    4       /* set if key is being constructed in userspace */
        156: #define KEY_FLAG_TRUSTED_ONLY      9       /* set if keyring only accepts links to trusted keys */
        157: #define KEY_FLAG_NEGATIVE  5       /* set if key is negative */
        158: 
        159:    /* the description string
        159:    /* the key type and key description string
        160:     * - this is used to match a key against search criteria
        160:     * - the desc is used to match a key against search criteria

    Hunk #2: FAILED     @@ -221,6 +222,7 @@ extern struct key *key_alloc(struct key_

      hint: Comparison of the content of the file and the content expected by the hunk:

        195: #define KEY_ALLOC_QUOTA_OVERRUN    0x0001  /* add to quota, permit even if overrun */
        196: #define KEY_ALLOC_NOT_IN_QUOTA     0x0002  /* not in quota */                                                                                             
        197: 
        197: #define KEY_ALLOC_TRUSTED  0x0004  /* Key should be flagged as trusted */
           :                                                                                                                                                           
        198: extern void key_revoke(struct key *key);
        199: extern void key_put(struct key *key);
        199: extern void key_invalidate(struct key *key);


    hint: Patch would apply on this file with fuzz 3

    hint: No previous patches touched this file.

  File security/keys/process_keys.c FAILED
    Hunk #1: FAILED     @@ -76,7 +76,10 @@ int install_user_keyrings(void)

      hint: Comparison of the content of the file and the content expected by the hunk:

         75:            if (IS_ERR(uid_keyring)) {
         76:                    uid_keyring = keyring_alloc(buf, user->uid, (gid_t) -1,
         76:                    uid_keyring = keyring_alloc(buf, user->uid, INVALID_GID,
         77:                                                cred, KEY_ALLOC_IN_QUOTA,
         77:                                                cred, user_keyring_perm,
         78:                                                NULL);
         78:                                                KEY_ALLOC_IN_QUOTA, NULL);
         79:                    if (IS_ERR(uid_keyring)) {
         80:                            ret = PTR_ERR(uid_keyring);                                                                                                    
         81:                            goto error;                                                                                                                    

    Hunk #2: FAILED     @@ -92,7 +95,9 @@ int install_user_keyrings(void)

      hint: Comparison of the content of the file and the content expected by the hunk:

         91:                    session_keyring =
         92:                            keyring_alloc(buf, user->uid, (gid_t) -1,
         92:                            keyring_alloc(buf, user->uid, INVALID_GID,
           :                                          cred, user_keyring_perm,                                                                                         
         93:                                          cred, KEY_ALLOC_IN_QUOTA, NULL);
         93:                                          KEY_ALLOC_IN_QUOTA, NULL);
         94:                    if (IS_ERR(session_keyring)) {
         95:                            ret = PTR_ERR(session_keyring);                                                                                                
         96:                            goto error_release;                                                                                                            


    hint: Patch would not apply on this file with any fuzz

    hint: 2 previous patches touched this file:
      patches.kernel.org/patch-2.6.32.15-16
      patches.fixes/0001-KEYS-fix-keyctl_set_reqkey_keyring-to-not-leak-threa.patch

  File security/keys/keyring.c FAILED
    Hunk #1: FAILED     @@ -930,15 +930,15 @@ found:

      hint: Comparison of the content of the file and the content expected by the hunk:

        517: /*
        518:  * find a keyring with the specified name
        518:  * Find a keyring with the specified name.
           :  *                                                                                                                                                        
        519:  * - all named keyrings are searched
        519:  * All named keyrings in the current user namespace are searched, provided they
           :  * grant Search permission directly to the caller (unless this check is                                                                                   
           :  * skipped).  Keyrings whose usage points have reached zero or who have been                                                                              
           :  * revoked are skipped.                                                                                                                                   
           :  *                                                                                                                                                        
        520:  * - normally only finds keyrings with search permission for the current process
        520:  * Returns a pointer to the keyring with the keyring's refcount having being
           :  * incremented on success.  -ENOKEY is returned if a key could not be found.                                                                              
        521:  */
        522: struct key *find_keyring_by_name(const char *name, bool skip_perm_check)                                                                                  
        523: {                                                                                                                                                         
        524:    struct key *keyring;                                                                                                                                   
        525:    int bucket;                                                                                                                                            

    Hunk #2: OK with offset -419        @@ -966,10 +966,15 @@ struct key *find_keyring_by_name(const c

    hint: Patch would not apply on this file with any fuzz

    hint: 2 previous patches touched this file:
      patches.kernel.org/patch-2.6.32.15-16
      patches.fixes/0001-keys-Guard-against-null-match-function-in-keyring_se.patch

FWIW I also saved a core dump (using GDB gcore command) for potential later analysis. Not sure how useful it is, since the corresponding debuginfo packages were not installed and have been meanwhile rebuilt in the Build Service.

The hanging version was rapidquilt-0.6.0-1.1.x86_64. The source files can be fetched from the Build Service:

osc co obs://build.opensuse.org/Kernel:tools/SLE_12_SP3/5586dcc80945d75e299c2ec4daaeab9c-rapidquilt

Note: the internal kernel-source repo has some branches that reproduce rapidquilt hangs. Some may be system-specific but some should be realiably reproducible, too.

git branch -a | grep rapid

I don't know how to get access to the relevant patches or to try to reproduce this, but looking at the implementation of src/rapidquilt/apply/parallel.rs, I wonder if the issue is that the mpsc pipe is limited to thread*2 messages. Each thread can send more than two messages if multiple FilePatches fail, leading to threads blocking.

Does something like this fix the issue (I'm not sure why sync_channel was used originally, but mpsc::channel doesn't have a baked-in limit of the number of messages in the queue):

diff --git a/src/rapidquilt/apply/parallel.rs b/src/rapidquilt/apply/parallel.rs
index b374f34d9612..c793f81442dc 100644
--- a/src/rapidquilt/apply/parallel.rs
+++ b/src/rapidquilt/apply/parallel.rs
@@ -459,7 +459,7 @@ pub fn apply_patches<'config, 'arena>(config: &'config ApplyConfig, arena: &'are
 
     // Prepare channels to send messages between applying threads.
     let (senders, receivers): (Vec<_>, Vec<_>) = (0..threads).map(|_| {
-        mpsc::sync_channel::<Message>(threads * 2) // At the moment every thread can send at most 2 messages, so lets use fixed size channel.
+        mpsc::channel::<Message>()
     }).unzip();
 
     // Create barrier for synchronization

Also I've noticed that the project isn't rustfmted -- I can send a patch to fix that if you like.

I don't know how to get access to the relevant patches or to try to reproduce this, but looking at the implementation of src/rapidquilt/apply/parallel.rs, I wonder if the issue is that the mpsc pipe is limited to thread*2 messages. Each thread can send more than two messages if multiple FilePatches fail, leading to threads blocking.

You are right. The limit of max two messages existed at first, because the thread used to stop applying after the first failure. So the thread would send at most one Message::NewEarliestBrokenPatchIndex and then one Message::ThreadDoneApplying. But later that was changed to make sure all FilePatches belonging to the same Patch attempt to apply, so we know if more of them failed and can report it to the user.

Alternative solution would be to not call broadcast_message(Message::NewEarliestBrokenPatchIndex); if the earliest_broken_patch_index didn't change. I.e. if the FilePatch that just failed belongs to the same Patch as the previously failed FilePatch.

The fix certainly helps quite a bit

branch before after
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup LOCKUP LOCKUP
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup2 LOCKUP LOCKUP
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup4 LOCKUP LOCKUP
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup5 LOCKUP LOCKUP
rapidquilt-lockup LOCKUP OK
rapidquilt-lockup2 LOCKUP OK
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup6-early LOCKUP OK
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup7-early LOCKUP OK
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup8-early LOCKUP OK
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup9-early LOCKUP OK
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockupA-early LOCKUP OK
remotes/origin/users/msuchanek/SLE15/rapidquilt-lockup2 LOCKUP OK
remotes/origin/users/msuchanek/fixes/linux-4.12/rapidquilt-lockup LOCKUP OK
remotes/origin/users/msuchanek/fixes/linux-4.12/rapidquilt-lockup-2 LOCKUP OK
remotes/origin/users/msuchanek/stable/rapidquilt-lockup LOCKUP OK
rapidquilt-lockup3 OK OK
remotes/origin/users/mbrugger/SLE15-SP1/rapid OK OK
remotes/origin/users/msuchanek/SLE12-SP3/rapidquilt-lockup3 OK OK
remotes/origin/users/msuchanek/SLE15-SP3/rapidquilt-EISDIR OK OK
remotes/origin/users/msuchanek/SLE15/rapidquilt-lockup OK OK