Crash in sctp_common_input_processing: race between usrsctp_conninput and usrsctp_accept

Question

Crash in sctp_common_input_processing: race between usrsctp_conninput and usrsctp_accept

Closed this issue 3 months ago · 1 comments

I have a crash in sctp_common_input_processing, where it is attempting to call SOCK_LOCK on a NULL pointer.

TLDR: sctp_common_input_processing needs to lock the accept lock before retrieving the value of stcb->sctp_socket->so_head from it.

The crash occurs in revision c1d6cb3 of usrsctp, but the same problem appears to be present in the latest code.

The crashing code is in netinet/sctp_input.c, with stack

#10 <signal handler called>
#11 0x0000ffffb2c40610 in ___pthread_mutex_lock (mutex=0xe8) at ./nptl/pthread_mutex_lock.c:80
#12 0x0000ffff2c0e19a0 in sctp_common_input_processing (mm=0xfffe4b99df38, iphlen=0, offset=12, length=564, src=0xfffe4b99df58, 
    dst=0xfffe4b99df68, sh=0xffff5451cca0, ch=0xffff5451ccac, compute_crc=1 '\001', ecn_bits=0 '\000', vrf_id=0, port=0)
    at netinet/sctp_input.c:5927
#13 0x0000ffff2c0cb358 in usrsctp_conninput (addr=0x2ef, buffer=0xffff5402bcf0, length=564, ecn_bits=0 '\000') at user_socket.c:3336

This code is:

5918	#if defined(__Userspace__)
5919		if ((stcb != NULL) &&
5920		    ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) &&
5921		    (stcb->sctp_socket != NULL)) {
5922			if (stcb->sctp_socket->so_head != NULL) {
5923				upcall_socket = stcb->sctp_socket->so_head;
5924			} else {
5925				upcall_socket = stcb->sctp_socket;
5926			}
5927			SOCK_LOCK(upcall_socket);
5928			soref(upcall_socket);
5929			SOCK_UNLOCK(upcall_socket);
5930		}
5931	#endif

The crash is in line 5927, where upcall_socket is null. Inspecting in a debugger also reveals that stcb->sctp_socket is not null, but stcb->sctp_socket->so_head is null.

This implies that the value of stcb->sctp_socket->so_head was set to null between lines 5922 and 5923.

In another thread in the core dump, I see that another thread is waiting for a lock on the same socket, with the stack

#3  ___pthread_mutex_lock (mutex=0xffff5473c228) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000ffff2c163730 in sctp_accept (so=0xffff6c3818f0, addr=0xfffe44f9df38) at netinet/sctp_usrreq.c:8540
#5  0x0000ffff2c0c6964 in soaccept (so=0xffff6c3818f0, nam=0xfffe44f9df38) at user_socket.c:1580
#6  0x0000ffff2c0c7230 in user_accept (head=0xfffecc0b8660, name=0x0, namelen=0x0, ptr_accept_ret_sock=0xfffe44f9dfe0)
    at user_socket.c:1662
#7  0x0000ffff2c0c73f4 in accept1 (so=0xfffecc0b8660, aname=0x0, anamelen=0x0, ptr_accept_ret_sock=0xfffe44f9dfe0) at user_socket.c:1740
#8  0x0000ffff2c0c755c in usrsctp_accept (so=0xfffecc0b8660, aname=0x0, anamelen=0x0) at user_socket.c:1777

I see in user_accept, shortly before this call to soaccept, the following code:

1645         SOCK_LOCK(so);                  /* soref() and so_state update */
1646         soref(so);                      /* file descriptor reference */
1647 
1648         TAILQ_REMOVE(&head->so_comp, so, so_list);
1649         head->so_qlen--;
1650         so->so_state |= (head->so_state & SS_NBIO);
1651         so->so_qstate &= ~SQ_COMP;
1652         so->so_head = NULL;
1653         SOCK_UNLOCK(so);
1654         ACCEPT_UNLOCK();

Notice the clearing of so->so_head at line 1652. The value of so in this thread is the same as the value of stcb->sctp_socket in the sctp_common_input_processing thread.

My conclusion is that because sctp_common_input_processing did not lock stcb->sctp_socket before checking the value of stcb->sctp_socket->so_head, user_accept was able to change its value asynchronously.

The solution is presumably to lock stcb->sctp_socket in sctp_common_input_processing. I can try preparing a patch.

Answer 1 · 2024-01-25T23:15:37.000Z

A correction -- so_head is protected by ACCEPT_LOCK(), so sctp_common_input_processing needs to hold ACCEPT_LOCK() (not the socket lock) while checking stcb->sctp_socket->so_head.