sctplab/usrsctp

Can't communicate with kernel SCTP stack program.

asterwyx opened this issue · 10 comments

I've written a simple sctp client using interface provided by linux, part of my code is like below:

#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <stdarg.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <stdbool.h>
#include <string.h>
#include <netinet/sctp.h>

static const uint16_t       CONN_PORT     = 7780;
static const int                BUF_LEN           = 4096;
static const char             CONN_ADDR[]  = "127.0.0.1";

int main(int argc, char *argv[])
{
    int sock_fd;
    int error;
    char msg_buf[BUF_LEN];
    struct sctp_sndrcvinfo info;
    struct sctp_event_subscribe sub;
    memset(&sub, 0, sizeof(struct sctp_event_subscribe));

    sock_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
    
    sub.sctp_data_io_event = 1;
    sub.sctp_association_event = 1;
    error = setsockopt(sock_fd, SOL_SCTP, SCTP_EVENTS, (char *)&sub, sizeof(sub));
    if (0 != error)
    {
        fprintf(stderr, "SCTP_EVENTS: error %d\n", error);
    }
    struct sockaddr_in addr;
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = inet_addr(CONN_ADDR);
    addr.sin_port = htons(CONN_PORT);
    error = connect(sock_fd, (struct sockaddr *)&addr, sizeof(addr));
    if (0 != error)
    {
        fprintf(stderr, "Can't connect to %s, port %d, errno: %d, %s\n", inet_ntoa(addr.sin_addr), ntohs(addr.sin_port), errno, strerror(errno));
        close(sock_fd);
        exit(EXIT_FAILURE);
    }
......
}

It confused me when I run echo_server provided by usrsctp and this simple sctp client. The connection failed, and it told me:

Can't connect to 127.0.0.1, port 7780, errno: 111, Connection refused

Then I tried to capture the packets to see what happened. I started Wireshark to capture packets on loopback NIC, then I got these records:

7406 1526.417963540    127.0.0.1 -> 127.0.0.1    SCTP 122 INIT
7407 1526.417984216    127.0.0.1 -> 127.0.0.1    SCTP 50 ABORT

Why did this happen? I've also written a simple server using Linux socket. And I've verified that my server and client could communicate with each other normally. They can complete INIT->INIT_ACK->COOKIE_ECHO->COOKIE_ACK process.

You can not run the kernel stack and the userland stack at the same time using SCTP/IP. The packet containing the ABORT chunk comes from the kernel stack handling the packet containing the INIT chunk.

Either use two different hosts (and no kernel stack on the host running the program with the userland stack) or use UDP encapsulation, but I think the Linux kernel does not support this yet.

You can not run the kernel stack and the userland stack at the same time using SCTP/IP. The packet containing the ABORT chunk comes from the kernel stack handling the packet containing the INIT chunk.

Either use two different hosts (and no kernel stack on the host running the program with the userland stack) or use UDP encapsulation, but I think the Linux kernel does not support this yet.

Why does the ABORT packet coming from the kernel stack handle the packet containing the INIT chunk? I tried to run usrsctp programs on two different hosts and the following is what I do:

  1. Run echo_server on a host whose IP address is 192.168.131.1.
./echo_server
  1. Run client on another host whose IP address is 192.168.131.2.
./client 192.168.131.1 7780

In the meantime, I ran tshark both on NIC whose IP address is 192.168.131.1 and on NIC whose IP address is 192.168.131.2 to monitor network traffic. I got below records:

306 14018.662563313 192.168.131.2 -> 192.168.131.1 SCTP 186 INIT
307 14018.662722705 192.168.131.1 -> 192.168.131.2 SCTP 60 ABORT
308 14018.662999216 192.168.32.1 -> 192.168.131.2 SCTP 682 INIT_ACK 

It confused me that 192.168.32.1 is just a virtual NIC and there was no route between it and 192.168.131.2. Why did echo_server change an address to send the INIT_ACK packet and why did it even send an ABORT packet from INIT packet's destination address to its source address?
PS: I have changed echo_server's listening port to 7780.

You can not run the kernel stack and the userland stack at the same time using SCTP/IP. The packet containing the ABORT chunk comes from the kernel stack handling the packet containing the INIT chunk.
Either use two different hosts (and no kernel stack on the host running the program with the userland stack) or use UDP encapsulation, but I think the Linux kernel does not support this yet.

Why does the ABORT packet coming from the kernel stack handle the packet containing the INIT chunk? I tried to run usrsctp programs on two different hosts and the following is what I do:

When you have a kernel stack, packets get delivered to that stack. Since the kernel stack does not have states for associations handled in userland, it considers them as out of the blue and replies with a packet containing an ABORT chunk.

  1. Run echo_server on a host whose IP address is 192.168.131.1.
./echo_server
  1. Run client on another host whose IP address is 192.168.131.2.
./client 192.168.131.1 7780

In the meantime, I ran tshark both on NIC whose IP address is 192.168.131.1 and on NIC whose IP address is 192.168.131.2 to monitor network traffic. I got below records:

306 14018.662563313 192.168.131.2 -> 192.168.131.1 SCTP 186 INIT
307 14018.662722705 192.168.131.1 -> 192.168.131.2 SCTP 60 ABORT
308 14018.662999216 192.168.32.1 -> 192.168.131.2 SCTP 682 INIT_ACK 

It confused me that 192.168.32.1 is just a virtual NIC and there was no route between it and 192.168.131.2. Why did echo_server change an address to send the INIT_ACK packet and why did it even send an ABORT packet from INIT packet's destination address to its source address?

Two issues:

  1. Either at he host owning 192.168.131.1 there must a second SCTP stack active, which sends the packet with the ABORT chunk, or there is a middlebox involved between 192.168.131.2 and 192.168.131.1 which sends the packet with the ABORT chunk.

  2. The server chooses the first address it thinks it can use. It is a limitation of the userland code. It doesn't know the kernels routing table...

PS: I have changed echo_server's listening port to 7780.

You can not run the kernel stack and the userland stack at the same time using SCTP/IP. The packet containing the ABORT chunk comes from the kernel stack handling the packet containing the INIT chunk.
Either use two different hosts (and no kernel stack on the host running the program with the userland stack) or use UDP encapsulation, but I think the Linux kernel does not support this yet.

Why does the ABORT packet coming from the kernel stack handle the packet containing the INIT chunk? I tried to run usrsctp programs on two different hosts and the following is what I do:

When you have a kernel stack, packets get delivered to that stack. Since the kernel stack does not have states for associations handled in userland, it considers them as out of the blue and replies with a packet containing an ABORT chunk.
What does "have a kernel stack" mean? I installed lksctp-tools and lksctp-tools-devel on my server. Does this mean that I have installed a kernel stack? But I didn't either run any binary after installation or insert any module to the kernel. I guess the Linux kernel has support for SCTP default? In other words, how can I remove the kernel SCTP stack? Simply uninstall lksctp-tools and lksctp-tools-devel?

  1. Run echo_server on a host whose IP address is 192.168.131.1.
./echo_server
  1. Run client on another host whose IP address is 192.168.131.2.
./client 192.168.131.1 7780

In the meantime, I ran tshark both on NIC whose IP address is 192.168.131.1 and on NIC whose IP address is 192.168.131.2 to monitor network traffic. I got below records:

306 14018.662563313 192.168.131.2 -> 192.168.131.1 SCTP 186 INIT
307 14018.662722705 192.168.131.1 -> 192.168.131.2 SCTP 60 ABORT
308 14018.662999216 192.168.32.1 -> 192.168.131.2 SCTP 682 INIT_ACK 

It confused me that 192.168.32.1 is just a virtual NIC and there was no route between it and 192.168.131.2. Why did echo_server change an address to send the INIT_ACK packet and why did it even send an ABORT packet from INIT packet's destination address to its source address?

Two issues:

  1. Either at the host owning 192.168.131.1, there must be a second SCTP stack active, which sends the packet with the ABORT chunk, or there is a middlebox involved between 192.168.131.2 and 192.168.131.1 which sends the packet with the ABORT chunk.
  2. The server chooses the first address it thinks it can use. It is a limitation of the userland code. It doesn't know the kernels routing table...

PS: I have changed echo_server's listening port to 7780.
Two issues:

  1. As you say, I guess that the Linux kernel SCTP stack received the INIT packet and replied with an ABORT packet. Then the problem is how can I prove this.
  2. What does "it think it can use" mean? Why does the server choose the source address of the INIT packet as the destination address of the INIT_ACK packet?
    Thanks!

If you are using Linux, lsmod lists the kernel modules, which are loaded. Does a module with the name sctp show up?

Yes, I've just run this command:

lsmod | grep sctp

I got below output:

sctp                  279238  2
libcrc32c              12644  4 xfs,sctp,nf_nat,nf_conntrack

So disabling kernel SCTP stack means removing the sctp module?
I tried it just now but was reminded that

rmmod: ERROR: Module sctp is in use

I've checked that my kernel client and server weren't running at the time.

Yes, I've just run this command:

lsmod | grep sctp

I got below output:

sctp                  279238  2
libcrc32c              12644  4 xfs,sctp,nf_nat,nf_conntrack

So disabling kernel SCTP stack means removing the sctp module?
I tried it just now but was reminded that

rmmod: ERROR: Module sctp is in use

I've checked that my kernel client and server weren't running at the time.

I've not much experience with Linux, but I think you can't unload the sctp once it is loaded. At least this was true in the past...

Yes, I've just run this command:

lsmod | grep sctp

I got below output:

sctp                  279238  2
libcrc32c              12644  4 xfs,sctp,nf_nat,nf_conntrack

So disabling kernel SCTP stack means removing the sctp module?
I tried it just now but was reminded that

rmmod: ERROR: Module sctp is in use

I've checked that my kernel client and server weren't running at the time.

I've not much experience with Linux, but I think you can't unload the sctp once it is loaded. At least this was true in the past...

Thanks, I've looked up some references and found that to remove the sctp module we need to add the -f option. But this didn't work for my server yet, so I rebooted it.

Why don't we just use the destination address of the INIT packet to fill the source address of the INIT_ACK packet? I've looked through the code and found that the INIT_ACK packet's source address is filled using the INIT packet's destination address only in the loopback scope. But in other cases, usrsctp will re-choose a source address for the INIT_ACK packet. I don't know why this is better.
My problem still exists, I can't run echo_server and client separately on two servers. They can't even start up an association. I modified the code by myself. usrsctplib\netinet\sctp_output.c, line 6673 to 6677:

if (stc.loopback_scope) {
	over_addr = (union sctp_sockstore *)dst;
} else {
	over_addr = NULL;
}

I've changed to directly assign dst to over_addr like below:

over_addr = (union sctp_sockstore *)dst;

I remade it all and tested it again. Now COOKIE_ECHO packet can be sent and COOKIE ACK can be sent too. But the problem was that there was no data sent between the two. I saw consecutive COOKIE_ECHO packets wat sent. It seems that the client didn't recognize the COOKIE_ACK packet and kept sending COOKIE_ECHO until max try. What might the root cause be? I guess my modification is incomplete. Does this have something to do with the state cookie? Thanks!

The FreeBSD kernel stack uses the IP layer to determine the source address (based on the routing table). The userland stack has not this functionality.

Regarding the COOKIE-ACK: Can you enable the debug output and get an idea why it is not accepted. Which IP addresses are used for the handshake?