Crash in sctp_common_input_processing: race between usrsctp_conninput and usrsctp_accept
Closed this issue · 1 comments
I have a crash in sctp_common_input_processing
, where it is attempting to call SOCK_LOCK
on a NULL pointer.
TLDR: sctp_common_input_processing
needs to lock the accept lock before retrieving the value of stcb->sctp_socket->so_head
from it.
The crash occurs in revision c1d6cb3 of usrsctp, but the same problem appears to be present in the latest code.
The crashing code is in netinet/sctp_input.c
, with stack
#10 <signal handler called>
#11 0x0000ffffb2c40610 in ___pthread_mutex_lock (mutex=0xe8) at ./nptl/pthread_mutex_lock.c:80
#12 0x0000ffff2c0e19a0 in sctp_common_input_processing (mm=0xfffe4b99df38, iphlen=0, offset=12, length=564, src=0xfffe4b99df58,
dst=0xfffe4b99df68, sh=0xffff5451cca0, ch=0xffff5451ccac, compute_crc=1 '\001', ecn_bits=0 '\000', vrf_id=0, port=0)
at netinet/sctp_input.c:5927
#13 0x0000ffff2c0cb358 in usrsctp_conninput (addr=0x2ef, buffer=0xffff5402bcf0, length=564, ecn_bits=0 '\000') at user_socket.c:3336
This code is:
5918 #if defined(__Userspace__)
5919 if ((stcb != NULL) &&
5920 ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) &&
5921 (stcb->sctp_socket != NULL)) {
5922 if (stcb->sctp_socket->so_head != NULL) {
5923 upcall_socket = stcb->sctp_socket->so_head;
5924 } else {
5925 upcall_socket = stcb->sctp_socket;
5926 }
5927 SOCK_LOCK(upcall_socket);
5928 soref(upcall_socket);
5929 SOCK_UNLOCK(upcall_socket);
5930 }
5931 #endif
The crash is in line 5927, where upcall_socket
is null. Inspecting in a debugger also reveals that stcb->sctp_socket
is not null, but stcb->sctp_socket->so_head
is null.
This implies that the value of stcb->sctp_socket->so_head
was set to null between lines 5922 and 5923.
In another thread in the core dump, I see that another thread is waiting for a lock on the same socket, with the stack
#3 ___pthread_mutex_lock (mutex=0xffff5473c228) at ./nptl/pthread_mutex_lock.c:93
#4 0x0000ffff2c163730 in sctp_accept (so=0xffff6c3818f0, addr=0xfffe44f9df38) at netinet/sctp_usrreq.c:8540
#5 0x0000ffff2c0c6964 in soaccept (so=0xffff6c3818f0, nam=0xfffe44f9df38) at user_socket.c:1580
#6 0x0000ffff2c0c7230 in user_accept (head=0xfffecc0b8660, name=0x0, namelen=0x0, ptr_accept_ret_sock=0xfffe44f9dfe0)
at user_socket.c:1662
#7 0x0000ffff2c0c73f4 in accept1 (so=0xfffecc0b8660, aname=0x0, anamelen=0x0, ptr_accept_ret_sock=0xfffe44f9dfe0) at user_socket.c:1740
#8 0x0000ffff2c0c755c in usrsctp_accept (so=0xfffecc0b8660, aname=0x0, anamelen=0x0) at user_socket.c:1777
I see in user_accept
, shortly before this call to soaccept
, the following code:
1645 SOCK_LOCK(so); /* soref() and so_state update */
1646 soref(so); /* file descriptor reference */
1647
1648 TAILQ_REMOVE(&head->so_comp, so, so_list);
1649 head->so_qlen--;
1650 so->so_state |= (head->so_state & SS_NBIO);
1651 so->so_qstate &= ~SQ_COMP;
1652 so->so_head = NULL;
1653 SOCK_UNLOCK(so);
1654 ACCEPT_UNLOCK();
Notice the clearing of so->so_head
at line 1652. The value of so
in this thread is the same as the value of stcb->sctp_socket
in the sctp_common_input_processing
thread.
My conclusion is that because sctp_common_input_processing
did not lock stcb->sctp_socket
before checking the value of stcb->sctp_socket->so_head
, user_accept
was able to change its value asynchronously.
The solution is presumably to lock stcb->sctp_socket
in sctp_common_input_processing
. I can try preparing a patch.
A correction -- so_head
is protected by ACCEPT_LOCK()
, so sctp_common_input_processing
needs to hold ACCEPT_LOCK()
(not the socket lock) while checking stcb->sctp_socket->so_head
.