brenns10/sos

Race condition in block()

Closed this issue · 1 comments

When a process needs to go to sleep, it calls block(), which will store context information into current->context and then call schedule() to select a new process. If the process (executing in SVC mode as a system call on behalf of the process, or else a kthread executing in SYS mode) is interrupted, then the interrupt handler will again dump context into current->context, corrupting the existing context. On return from the IRQ, the process will continue into the scheduler. When the process is next rescheduled, the corrupted context (usually) results in a fault.

Here is an example:

stephen at pride in ~/repos/sos (master %)
$ make integrationtest
========================================================= test session starts ==========================================================
platform linux -- Python 3.8.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home/stephen/repos/sos
collected 4 items

integrationtests/test_simple.py ...                                                                                              [ 75%]
integrationtests/test_udp.py F                                                                                                   [100%]

=============================================================== FAILURES ===============================================================
______________________________________________________________ test_full _______________________________________________________________

vm = <conftest.SosVirtualMachine object at 0x7f7a626e9a90>
sk = <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=17, laddr=('0.0.0.0', 5051)>

    def test_full(vm, sk):
        res = vm.cmd('socket')
        fildes = int(re.search(r'socket\(\) = (\d+)', res).group(1))

        vm.cmd(f'connect {fildes} 10.0.2.2 {sk.getsockname()[1]}')
>       vm.cmd(f'send {fildes} ABAB_CDCD_EFEF')

integrationtests/test_udp.py:26:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
integrationtests/conftest.py:99: in cmd
    return self.read_until(pattern, timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <conftest.SosVirtualMachine object at 0x7f7a626e9a90>, pattern = re.compile('[uk]sh>'), timeout = 2

    def read_until(self, pattern, timeout=None):
        if not isinstance(pattern, re.Pattern):
            pattern = re.compile(pattern)
        if timeout is None:
            timeout = self.timeout

        wait_until = time.time() + timeout
        result = ''
        while True:
            time_left = wait_until - time.time()
            if time_left <= 0:
                break
            try:
                data = self.stdout_queue.get(timeout=time_left)
                result += data.decode('utf-8')
            except queue.Empty:
                pass

            found = pattern.search(result)
            if found:
                return result

            found = self.abort.search(result)
            if found and self.debug:
                time.sleep(0.1)  # bad sleep sync to let stdout thread print
                print('\n[sos test] Fault detected! Hit enter when done')
                print('[sos test] debugging to exit the test.')
                input()
            if found:
>               raise Exception(f'Fault encountered waiting:\n{result}')
E               Exception: Fault encountered waiting:
E
E               netif is configured with ip 10.0.2.15, gateway 10.0.2.2, subnet 255.255.255.0, dns 10.0.2.3
E               Uh-oh... data abort! DFSR=7 DFAR=15af90 LR=39b4
E               Fault occurred with PSR: 0x2000001f
E               Translation fault (page): 0x15af90
E               Context history:
E               kernel initialized
E               schedule process 3 (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                  block (x1)
E               schedule process 2 (x1)
E                  IRQ 30 "timer" interrupted 0x9590 (x1)
E               schedule process 1 (x1)
E                  IRQ 33 "uart" interrupted 0x38a8 (x1)
E                  IRQ 78 "virtio-net" interrupted 0x38a8 (x1)
E                  IRQ 30 "timer" interrupted 0x38bc (x1)
E               schedule process 3 (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                  block (x1)
E               schedule process 2 (x1)
E                  block (x1)
E               schedule process 2 (x1)
E                  IRQ 33 "uart" interrupted 0xa418 (x1)
E                  block (x1)
E               schedule process 3 (x1)
E                  IRQ 78 "virtio-net" interrupted 0x694 (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                 syscall return (x1)
E                 syscall (x1)
E                  block (x1)
E                  IRQ 33 "uart" interrupted 0x3314 (x1)
E               schedule process 2 (x1)
E               schedule process 3 (x1)
E               End of context history
E               END OF FAULT REPORT

integrationtests/conftest.py:91: Exception
======================================================= short test summary info ========================================================
FAILED integrationtests/test_udp.py::test_full - Exception: Fault encountered waiting:
===================================================== 1 failed, 3 passed in 0.43s ======================================================
make: *** [Makefile:146: integrationtest] Error 1

I fixed this with a range of protections:

  1. Interrupts dump context to irq-mode stack rather than current->context
  2. Further race conditions for scheduling outside of SVC mode were addressed by adding the .nopreempt section and preempt_enabled flag. Preemption now can happen in SVC mode!