Race condition in block()
Closed this issue · 1 comments
brenns10 commented
When a process needs to go to sleep, it calls block()
, which will store context information into current->context
and then call schedule()
to select a new process. If the process (executing in SVC mode as a system call on behalf of the process, or else a kthread executing in SYS mode) is interrupted, then the interrupt handler will again dump context into current->context
, corrupting the existing context. On return from the IRQ, the process will continue into the scheduler. When the process is next rescheduled, the corrupted context (usually) results in a fault.
Here is an example:
stephen at pride in ~/repos/sos (master %)
$ make integrationtest
========================================================= test session starts ==========================================================
platform linux -- Python 3.8.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home/stephen/repos/sos
collected 4 items
integrationtests/test_simple.py ... [ 75%]
integrationtests/test_udp.py F [100%]
=============================================================== FAILURES ===============================================================
______________________________________________________________ test_full _______________________________________________________________
vm = <conftest.SosVirtualMachine object at 0x7f7a626e9a90>
sk = <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=17, laddr=('0.0.0.0', 5051)>
def test_full(vm, sk):
res = vm.cmd('socket')
fildes = int(re.search(r'socket\(\) = (\d+)', res).group(1))
vm.cmd(f'connect {fildes} 10.0.2.2 {sk.getsockname()[1]}')
> vm.cmd(f'send {fildes} ABAB_CDCD_EFEF')
integrationtests/test_udp.py:26:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
integrationtests/conftest.py:99: in cmd
return self.read_until(pattern, timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <conftest.SosVirtualMachine object at 0x7f7a626e9a90>, pattern = re.compile('[uk]sh>'), timeout = 2
def read_until(self, pattern, timeout=None):
if not isinstance(pattern, re.Pattern):
pattern = re.compile(pattern)
if timeout is None:
timeout = self.timeout
wait_until = time.time() + timeout
result = ''
while True:
time_left = wait_until - time.time()
if time_left <= 0:
break
try:
data = self.stdout_queue.get(timeout=time_left)
result += data.decode('utf-8')
except queue.Empty:
pass
found = pattern.search(result)
if found:
return result
found = self.abort.search(result)
if found and self.debug:
time.sleep(0.1) # bad sleep sync to let stdout thread print
print('\n[sos test] Fault detected! Hit enter when done')
print('[sos test] debugging to exit the test.')
input()
if found:
> raise Exception(f'Fault encountered waiting:\n{result}')
E Exception: Fault encountered waiting:
E
E netif is configured with ip 10.0.2.15, gateway 10.0.2.2, subnet 255.255.255.0, dns 10.0.2.3
E Uh-oh... data abort! DFSR=7 DFAR=15af90 LR=39b4
E Fault occurred with PSR: 0x2000001f
E Translation fault (page): 0x15af90
E Context history:
E kernel initialized
E schedule process 3 (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E block (x1)
E schedule process 2 (x1)
E IRQ 30 "timer" interrupted 0x9590 (x1)
E schedule process 1 (x1)
E IRQ 33 "uart" interrupted 0x38a8 (x1)
E IRQ 78 "virtio-net" interrupted 0x38a8 (x1)
E IRQ 30 "timer" interrupted 0x38bc (x1)
E schedule process 3 (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E block (x1)
E schedule process 2 (x1)
E block (x1)
E schedule process 2 (x1)
E IRQ 33 "uart" interrupted 0xa418 (x1)
E block (x1)
E schedule process 3 (x1)
E IRQ 78 "virtio-net" interrupted 0x694 (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E syscall return (x1)
E syscall (x1)
E block (x1)
E IRQ 33 "uart" interrupted 0x3314 (x1)
E schedule process 2 (x1)
E schedule process 3 (x1)
E End of context history
E END OF FAULT REPORT
integrationtests/conftest.py:91: Exception
======================================================= short test summary info ========================================================
FAILED integrationtests/test_udp.py::test_full - Exception: Fault encountered waiting:
===================================================== 1 failed, 3 passed in 0.43s ======================================================
make: *** [Makefile:146: integrationtest] Error 1
brenns10 commented
I fixed this with a range of protections:
- Interrupts dump context to irq-mode stack rather than
current->context
- Further race conditions for scheduling outside of SVC mode were addressed by adding the
.nopreempt
section andpreempt_enabled
flag. Preemption now can happen in SVC mode!