mos-stack/mOS-networking-stack

midstat inline mode MAC address reversed and not translated to server MAC address

vincentmli opened this issue · 15 comments

Hi Asim

I think mOS is very cool project so I followed http://mos.kaist.edu/guide/config/01_inline.html#configuration-steps to test midstat sample app, but I ran into issue below.

I have client, mOS middle box, server connected with direct cable as below:

client (p1p1 10.0.0.6)<--->(dpdk0 10.0.0.7 midstat dpdk1 10.0.1.7) <---> eth0 (10.0.1.8) server

1, I ran command curl on client to send http request to server to see if midstat can forward the request.

2, I configured client to use dpdk0 10.0.0.7 as gateway to server 10.0.1.8 and configured server to use dpdk1 10.0.1.7 as gateway to client 10.0.0.6.

3, I manually added ARP for 10.0.0.7 on client and added ARP for 10.0.1.7 on server

4, I did not manually add any ARP in mOS

here is my detail configuration information:

---client:

ip addr show p1p1

14: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:1b:21:50:bc:38 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 scope global p1p1
valid_lft forever preferred_lft forever
inet6 fe80::21b:21ff:fe50:bc38/64 scope link
valid_lft forever preferred_lft forever

cat /proc/net/arp

IP address HW type Flags HW address Mask Device

10.0.0.7 0x1 0x6 a0:36:9f:a1:4d:6c * p1p1

ip route show

10.0.1.8 via 10.0.0.7 dev p1p1

curl http://10.0.1.8/

----mOS midstat

ip addr show dpdk0

27: dpdk0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/ether a0:36:9f:a1:4d:6c brd ff:ff:ff:ff:ff:ff
inet 10.0.0.7/24 brd 10.0.0.255 scope global dpdk0
valid_lft forever preferred_lft forever
inet6 fe80::a236:9fff:fea1:4d6c/64 scope link
valid_lft forever preferred_lft forever

ip addr show dpdk1

28: dpdk1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/ether a0:36:9f:a1:4d:6d brd ff:ff:ff:ff:ff:ff
inet 10.0.1.7/24 brd 10.0.1.255 scope global dpdk1
valid_lft forever preferred_lft forever
inet6 fe80::a236:9fff:fea1:4d6d/64 scope link
valid_lft forever preferred_lft forever

ip route show

10.0.0.0/24 dev dpdk0 proto kernel scope link src 10.0.0.7
10.0.1.0/24 dev dpdk1 proto kernel scope link src 10.0.1.7

cat config/mos.conf

######### MOS configuration file

MOS-RELATED OPTIONS

mos {
forward = 1

    #######################
    ##### I/O OPTIONS #####
    #######################
    # number of memory channels per socket [mandatory for DPDK]
    nb_mem_channels = 4

    # devices used for MOS applications [mandatory]
    netdev {
            dpdk0 0x00FF
            dpdk1 0x00FF
    }

.....................CUT..................

    ########################
    ## NETWORK PARAMETERS ##
    ########################
    # This to configure static arp table
    # (Destination IP address) (Destination MAC address)
    arp_table {
    }

    # This is to configure static routing table
    # (Destination address)/(Prefix) (Device name)
    route_table {
    }

    # This is to configure static bump-in-the-wire NIC forwarding table
    # DEVNIC_A DEVNIC_B ## (e.g. dpdk0 dpdk1)
    nic_forward_table {
            dpdk0 dpdk1
    }

   ..............CUT...........

}

EAL: PCI device 0000:01:00.0 on NUMA socket -1
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: PCI memory mapped at 0x7f5532c00000
EAL: PCI memory mapped at 0x7f5532d00000
EAL: PCI device 0000:01:00.1 on NUMA socket -1
EAL: probe driver: 8086:1521 rte_igb_pmd
...................

load_module(): 0x86f500
Initializing port 0... done:
Initializing port 1... done:

Checking link status.....................................done
Port 0 Link Up - speed 1000 Mbps - full-duplex
Port 1 Link Up - speed 1000 Mbps - full-duplex
===== MOS configuration =====
| num_cores: 8
| nb_mem_channels: 4
| max_concurrency: 100000
| rmem_size: 8192
| wmem_size: 8192
| tcp_tw_interval: 0
| tcp_timeout: 30000
| multiprocess: false
| mos_log: logs/
| stat_print: dpdk0 dpdk1
| forward: forward
|
+===== Netdev configuration (2 entries) =====
| dpdk0(idx: 0, HADDR: A0:36:9F:A1:4D:6C) maps to CPU 0x00000000000000FF
| dpdk1(idx: 1, HADDR: A0:36:9F:A1:4D:6D) maps to CPU 0x00000000000000FF
|
+===== Static ARP table configuration (0 entries) =====
|
+===== Routing table configuration (4 entries) =====
| IP: 0x00000000, NETMASK: 0x00000000, INTERFACE: br0(idx: 0)
| IP: 0x0A000000, NETMASK: 0xFFFFFF00, INTERFACE: dpdk0(idx: 0)
| IP: 0x0A000100, NETMASK: 0xFFFFFF00, INTERFACE: dpdk1(idx: 1)
| IP: 0xC0A80100, NETMASK: 0xFFFFFF00, INTERFACE: br0(idx: 0)
|
+===== NIC Forwarding table configuration (1 entries) =====
| NIC Forwarding Entry: dpdk0 <---> dpdk1 |
| NIC Forwarding Index Table: |
| 0 --> 1 |
| 1 --> 0 |
| 2 --> -1 |
| 3 --> -1 |
| 4 --> -1 |
| 5 --> -1 |
| 6 --> -1 |
| 7 --> -1 |
| 8 --> -1 |
| 9 --> -1 |
| 10 --> -1 |
| 11 --> -1 |
| 12 --> -1 |
| 13 --> -1 |
| 14 --> -1 |
| 15 --> -1 |

Proto CPU Client Address Client State Server Address Server State
tcp 7 10.0.0.6:54783 SYN_SENT 10.0.1.8:80 SYN_RCVD

----server

ip addr show eth0

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:15:60:0e:3d:0a brd ff:ff:ff:ff:ff:ff
inet 10.0.1.8/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::215:60ff:fe0e:3d0a/64 scope link
valid_lft forever preferred_lft forever

ip route show

10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.1.8
10.0.0.6 via 10.0.1.7 dev eth0
10.0.1.0/24 dev eth0 proto kernel scope link src 10.0.1.8

cat /proc/net/arp

IP address HW type Flags HW address Mask Device

10.0.1.7 0x1 0x6 a0:36:9f:a1:4d:6d * eth0

tcpdump -nn -e -i eth0

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

10:06:24.367009 a0:36:9f:a1:4d:6d > a0:36:9f:a1:4d:6c, ethertype IPv4 (0x0800), length 74: 10.0.0.6.54783 > 10.0.1.8.80: Flags [S], seq 539663012, win 29200, options [mss 1460,sackOK,TS val 19265926 ecr 0,nop,wscale 7], length 0
10:06:25.366176 a0:36:9f:a1:4d:6d > a0:36:9f:a1:4d:6c, ethertype IPv4 (0x0800), length 74: 10.0.0.6.54783 > 10.0.1.8.80: Flags [S], seq 539663012, win 29200, options [mss 1460,sackOK,TS val 19266176 ecr 0,nop,wscale 7], length 0

as you can see from above tcpdump on server, there are two things which seems to be wrong:

1 the SYN packet from client 10.0.0.6 to server 10.0.1.8 is suppose to be forwarded by dpdk0 ---> dpdk1, but the source mac and destination MAC is reversed:

dpdk1 a0:36:9f:a1:4d:6d > dpdk 0 a0:36:9f:a1:4d:6c

2 the SYN packet destination MAC should be translated to server MAC 00:15:60:0e:3d:0a, but it is dpdk0 MAC a0:36:9f:a1:4d:6c. thus when server receives the SYN packet, the destination MAC does not exist on server so server did not respond with SYN+ACK and silently dropped

I think I may have missed some configuration. so any clue would be helpful

Hi,

I see a couple of mistakes in the configuration. But before that, for debugging purposes, can you please temporarily adjust the client and the server IP addresses so that they belong to the same subnet?

For the example below, I am assuming client's IP address to be 10.0.0.7 and server's IP address to be 10.0.0.8. The IP address values of the mOS middlebox interfaces should not matter as they are set to promiscuous mode once the middlebox application is instantiated (the mOS node acts as a bump-in-the-wire). All traffic entering dpdk0 is forwarded straight to dpdk1 and vice versa.

You should store server's static ARP entry on the client machine and the client's static ARP entry in the server.

Client:
$ sudo arp -s 10.0.0.8 <SRV_MAC_ADDR>

Server:
$sudo arp -s 10.0.0.7 <CLI_MAC_ADDR>

Please delete all other static ARP entries and type curl http://10.0.0.8/to talk to the server. Please let me know if you still see any issues. I will also test this setup on my testbed when I get back to work. Thanks!

thanks for the quick reply, it worked after following your advice, I mis-understand the use case of midstat inline mode, I thought the mOS act as some kind of router in between two subnet network, it seems not the case, need to spend more time to understand it :)

by the way, I tried to debug the code, in mTCP, I can edit the core/src/Makefile directly, but now mOS use setup.sh --compile-dpdk to compile dpdk and mTCP which would rewrite core/src/Makefile each time. could you let me know how to easily turn on the debug in core/src/Makefile, I would like to have the capability to trace the packet if I run into any issue. thanks!

@ajamshed Can't mOS work as a middlebox between two different subnets ?

@IurmanJ ,

It should work. mOS-based middlebox works as a bump-in-the-wire. This means that packets will be forwarded without making any routing decisions. Given that you have correctly set up nic_forward_table in mos.conf file, you should get the middlebox correctly working. Please let me know if you any follow-up questions.

@ajamshed

Thanks, that's what I thought. However, despite having a good-looking configuration and setup, I still have issues. Here is the topology:

Client (.2) <----------> (.1) MiddleBox mOS (.1) <----------> (.2) Server
            10.0.0.0/24                          10.0.1.0/24

Static ARP entries and routes are correctly set on both client and server. Here is a part of the mos.conf file used for both midstat and firewall samples:

mos {
        forward = 1

        #######################
        ##### I/O OPTIONS #####
        #######################
        # number of memory channels per socket [mandatory for DPDK]
        nb_mem_channels =  4

        # devices used for MOS applications [mandatory]
        netdev {
                dpdk0 0x0002
                dpdk1 0x0004
        }

        #######################
        ### LOGGING OPTIONS ###
        #######################
        # NICs to print network statistics per second
        # if enabled, mTCP will print xx Gbps and xx pps for RX and TX
        stat_print = dpdk0 dpdk1

        # A directory contains MOS system log files
        mos_log = logs/

        ########################
        ## NETWORK PARAMETERS ##
        ########################
        # This to configure static arp table
        # (Destination IP address) (Destination MAC address)
        arp_table {
                #10.0.0.2 3c:fd:fe:9e:5b:60
                #10.0.1.2 3c:fd:fe:9e:5c:40
        }

        # This is to configure static routing table
        # (Destination address)/(Prefix) (Device name)
        route_table {
               #10.0.0.0/24 dpdk0
               #10.0.1.0/24 dpdk1
        }

        # This is to configure static bump-in-the-wire NIC forwarding table
        # DEVNIC_A DEVNIC_B ## (e.g. dpdk0 dpdk1) 
        nic_forward_table {
                dpdk0 dpdk1
        }

Note that I also tried to uncomment my arp/routes entries or to insert them manually with ip command (but that shouldn't be necessary, correct me if I'm wrong).

Problem is, I can't see packets on server side since they're not forwarded by the MB. Or, when I see them, they seem not to be forwarded back (no response on client side).

Funny fact: I can ping both MB interfaces from client/server but not client->server or server->client. Arping command works for client->server as long as tcpdump is sniffing the traffic on server side. It is a somewhat random behavior that looks like sorcery (due to promiscuous mode ?). Also, I'm able to see packets on server side when I send a forged TCP packet (python/scapy) from the client, or with iperf. Which makes me say that the traffic never comes back.

Can you spot something wrong here ?

Apologies for the delayed response. Can you please do 2 small tests?

1- Directly attach client with the server and see if they work fine. This is to make sure that there are no iptables rules that are blocking any packets on either endpoint.

2- You don't have to set any IP address to dpdk0 or dpdk1. Please see this note: http://mos.kaist.edu/guide/config/01_inline.html#compile-and-build-mos-library-and-applications.

No worries.

1- Actually, the testbed is also used for other tests where only the middlebox software is changed on the middlebox (same machine, same setup, same configuration). And everything works as expected when running other setups, before and/or after mOS test case. So I guess that's not where the problem lies, even if they are not directly linked together. They are linked to a switch.

2- My bad, indeed it's not required. However, I don't think it could cause the issue, do you ? Anyway, I already tried without setting ip addresses on both dpdk0 and dpdk1. No change, as far as I remember.

I'm completely stuck, and I feel like I already tried and tested any possibilities. Right now, I don't know what I could do more, do you have any other idea ?

@ajamshed

I come back to you for a small update. Actually, as I said in my previous comment, all 3 machines were linked altogether through a switch. The topology was the following (switch omitted):

  Client <-----------------> (dpdk0) MiddleBox mOS (dpdk1) <-----------------> Server

 10.0.0.2/24                                                                10.0.1.2/24
3c:fd:fe:9e:5b:60       3c:fd:fe:af:e1:48     3c:fd:fe:af:e1:49         3c:fd:fe:9e:5c:40

Problem found: mOS (the middlebox) does not replace its MAC address (dest) by the one of the server. Is it an intended behavior ? That would mean mOS must act as a transparent inline forwarder only. That would also explain why client MAC and server MAC are required on opposite sides, respectively. This technique wouldn't work with a switch as the middlebox would be bypassed.

So, I tried your solution by removing the switch. Now, the 3 machines are directly linked (same topology as above). I assumed mOS only works as a transparent inline forwarder (is it ?) and configured ARP entries on both client and server. The client now sends packets to the server with server's MAC as dest. Still no response from mOS. I can see packets coming on dpdk0, but nothing is forwarded on dpdk1 (see below):

[ ALL ] dpdk0, RX: 1(pps) (err: 0), 0.00(Gbps), TX: 0(pps), 0.00(Gbps)
[ ALL ] dpdk1, RX: 0(pps) (err: 0), 0.00(Gbps), TX: 0(pps), 0.00(Gbps)

I mean, mOS does not forward anything, nor responds anything to the client either. I don't see anything but sent packets on the client side. What's going wrong ? I used the setup you suggested, still I can't make it work as expected.

Note that I tried the following on the middlebox machine running mOS:

  • with/without kernel forwarding enabled
  • with/without commenting arp/route entries in mos.conf
  • with/without arp/route entries directly with commands
  • with midstat and simple_firewall samples

Also noticed that if an ARP request is sent to mOS (eg, when not configured on client), it does not respond. Is mOS just a blind/transparent forwarder ?

Machine running mOS: Intel Xeon CPU E5-2630 v3 @ 2.40GHz 8 Cores, 16 Threads, 16GB RAM, running Ubuntu Server 18.04.1 with 4.15 kernel, with Intel XL710 2x40GB QSFP+ NIC with Receive-Side Scaling enabled (RSS).

Hi @IurmanJ,

The default mOS setup is that of a transparent inline forwarder (without switch). If you are interested in using a switch, then I suggest that you refer to this link: http://mos.kaist.edu/guide/release/01_micro_inline.html.

Please make sure that the nic_forward_table is correctly set up in mos.conf file. Also, make sure that both interfaces (dpdk0 & dpdk1) are up. I am not sure exactly what is the underlying reason for the issue that you are reporting. It is most likely a configuration problem. I will like to talk more about your setup in detail. Can you please send an email to me (asim.jamshed@gmail.com)? We can probably set up a Hangout session and talk this through.

@ajamshed

Indeed it was a configuration problem. Now, it's working fine for both setups (with and without the switch). Thanks a million for your time.

NB: is it possible to run the firewall sample (for instance) in the background ? I tried with "&" or nohup, all seem to stop the running script or make it unavailable. For now, it only works when run normally.

I am glad to know that your setup is now working.

I don't think there is anything special you need to do to get the application running as a background process. You may want to refer to this link: https://unix.stackexchange.com/questions/420594/why-process-killed-with-nohup

@ajamshed

Well, I thought the same at first glance. Anyway, running "nohup sudo ./simple_firewall &" makes the program to abort, and I get the following error in nohup output:

PANIC in rte_eal_remote_launch():
cannot read on configuration pipe
7: [./simple_firewall(_start+0x2a) [0x5624afba3e9a]]
6: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fa80b9b8b97]]
5: [./simple_firewall(main+0x959) [0x5624afba02c9]]
4: [./simple_firewall(mtcp_create_context+0x1de) [0x5624afba71ce]]
3: [./simple_firewall(+0x10ff49) [0x5624afc45f49]]
2: [./simple_firewall(__rte_panic+0xc5) [0x5624afb99067]]
1: [./simple_firewall(rte_dump_stack+0x2e) [0x5624afc4847e]]

It more looks like an issue on DPDK side when run in background. Any idea ?

@IurmanJ ,

I tried running testpmd with nohup command. It also aborted after a few seconds. This looks like a dpdk-specific issue. You may want to contact dpdk maintainers for this.

@ajamshed

Indeed, you're right. I tried that too and had the same conclusion as yours. That's why I came up with another solution: daemonize the script. An easy way is to simulate it by using "screen". Now, it runs well, as expected. Still, when the application (eg. firewall) is killed (and so is the "screen" manager), I can't unbind/rebind interfaces back to the kernel. There's a sort of loop about dpdk driver or something like that, I don't remember exactly. I'm not sure it is mOS-related or dpdk-related. Maybe you can give it a try and tell me if you get the same problem.

screen -d -m sudo ${MOS_TARGET}/samples/simple_firewall/simple_firewall

Then, kill it and unbind interfaces.

@IurmanJ,

I tried running the screen command, and then later killed the process. I was successfully able to unbind the dpdk-bound interfaces thereafter. Sometimes a dpdk process, when sent a kill command, needs a few seconds to completely get destroyed (& cleaned up by the OS). I suggest that you wait for the process to completely die off before trying to unbind the interfaces (maybe ps command may help).