net: TestFilePacketConn fails on Scaleway
bradfitz opened this issue · 15 comments
On a Scaleway ARM host (where we're trying to move the ARM builders), the net package fails with:
--- FAIL: TestFilePacketConn (0.00s)
file_test.go:113: write ip 127.0.0.1->127.0.0.1: sendto: bad address
Debug:
root@scw-105acb:~/go/src# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.10
DISTRIB_CODENAME=utopic
DISTRIB_DESCRIPTION="Ubuntu 14.10"
root@scw-105acb:~/go/src# ifconfig
docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr 00:07:cb:03:76:44
inet addr:10.1.34.160 Bcast:10.1.35.255 Mask:255.255.254.0
inet6 addr: fe80::207:cbff:fe03:7644/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:357998 errors:0 dropped:0 overruns:0 frame:0
TX packets:108129 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:532
RX bytes:352772865 (352.7 MB) TX bytes:2078718437 (2.0 GB)
Interrupt:24
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:20563 errors:0 dropped:0 overruns:0 frame:0
TX packets:20563 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:66891220 (66.8 MB) TX bytes:66891220 (66.8 MB)
root@scw-105acb:~/go/src# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.1.34.1 0.0.0.0 UG 0 0 0 eth0
10.1.34.0 0.0.0.0 255.255.254.0 U 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
Note that this machine has a Docker daemon running, but I'm not yet running the build inside a container. This failure was from running on the host machine, as part of evaluating the speed of these machines.
/cc @mikioh, @davecheney, @crawshaw, @adg
And the strace:
[pid 15756] socket(PF_INET, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_ICMP) = 3
[pid 15756] setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 15756] bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid 15756] epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3058795512, u64=3058795512}}) = 0
[pid 15756] getsockname(3, {sa_family=AF_INET, sin_port=htons(1), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid 15756] getpeername(3, 0x10649bac, [112]) = -1 ENOTCONN (Transport endpoint is not connected)
[pid 15756] fcntl(3, F_DUPFD_CLOEXEC, 0) = 5
[pid 15756] fcntl(5, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 15756] fcntl(5, F_SETFL, O_RDWR) = 0
[pid 15756] fcntl(5, F_DUPFD_CLOEXEC, 0) = 6
[pid 15756] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR)
[pid 15756] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 15756] getsockopt(6, SOL_SOCKET, SO_TYPE, [3], [4]) = 0
[pid 15756] getsockname(6, {sa_family=AF_INET, sin_port=htons(1), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid 15756] getpeername(6, 0x10649bdc, [112]) = -1 ENOTCONN (Transport endpoint is not connected)
[pid 15756] epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3058795392, u64=3058795392}}) = 0
[pid 15756] sendto(6, "", 0, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)
[pid 15756] clock_gettime(CLOCK_REALTIME, {1430954968, 213392054}) = 0
[pid 15756] write(1, "--- FAIL: TestFilePacketConn (0."..., 529--- FAIL: TestFilePacketConn (0.04s)
file_test.go:113: write ip 127.0.0.1->127.0.0.1: sendto: bad address
) = 529
[pid 15756] write(1, "FAIL\n", 5FAIL
) = 5
[pid 15756] close(3) = 0
[pid 15756] exit_group(1) = ?
[pid 15758] +++ exited with 1 +++
[pid 15757] +++ exited with 1 +++
+++ exited with 1 +++
The sendto EFAULT is seems wrong.
EFAULT An invalid user space address was specified for an argument.
The man page says:
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
const struct sockaddr *dest_addr, socklen_t addrlen);
Is a NULL buf *void okay, even with len 0?
Actually, the syscall package already tries hard to avoid a NULL *void:
// Single-word zero for use when we need a valid pointer to 0 bytes.
// See mksyscall.pl.
var _zero uintptr
func sendto(s int, buf []byte, flags int, to unsafe.Pointer, addrlen _Socklen) (err error) {
var _p0 unsafe.Pointer
if len(buf) > 0 {
_p0 = unsafe.Pointer(&buf[0])
} else {
_p0 = unsafe.Pointer(&_zero)
}
_, _, e1 := Syscall6(SYS_SENDTO, uintptr(s), uintptr(_p0), uintptr(len(buf)), uintptr(flags), uintptr(to), uintptr(addrlen))
if e1 != 0 {
err = errnoErr(e1)
}
return
}
... yet &_zero (which should be non-nil) ends up as zero according to the strace.
Is Syscall6 doing the right thing?
This machine FWIW has 4 of these:
# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 1332.01
Features : half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae
CPU implementer : 0x56
CPU architecture: 7
CPU variant : 0x2
CPU part : 0x584
CPU revision : 2
No, I just can't read. The buf pointer is indeed non-zero. I was off by one reading all the empty values. And strace in raw mode (as well as some printlns in the syscall package) confirms:
[pid 16133] write(2, "sendto zero ", 12sendto zero ) = 12
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "0x3af964", 80x3af964) = 8
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "0x3af964", 80x3af964) = 8
[pid 16133] write(2, "\n", 1
) = 1
[pid 16133] write(2, "sendto ", 7sendto ) = 7
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "290", 3290) = 3
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "6", 16) = 1
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "3864932", 73864932) = 7
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "0", 10) = 1
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "0", 10) = 1
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "275330280", 9275330280) = 9
[pid 16133] write(2, " ", 1 ) = 1
[pid 16133] write(2, "16", 216) = 2
[pid 16133] write(2, "\n", 1
) = 1
[pid 16133] sendto(0x6, 0x3af964, 0, 0, 0x106934e8, 0x10) = -1 (errno 14)
So it's only len and flags which are zero.
Still no clue about the EFAULT, though.
Ah, if the error you are seeing is only
write ip 127.0.0.1->127.0.0.1: sendto: bad address
I'll take this issue. Seems like it just happens in the top/middle-half of ICMP stack.
Not sure what that means but happy for a fix. (ICMP has three halves? :))
As a matter of convenience, I usually think that it consists of socket-interface adaptation layer (or service access point layer), protocol layer and transport (in this case IP) adaptation layer. I believe that the root cause of this issue is just passing a corrupted ICMP packet to the kernel. Certainly the 4-year-old test cases need to be updated for the recent restricted kernels.
In addition, from Go 1.5, the full stack test cases for IPConn have been moved to the following:
golang.org/x/net/ipv4
golang.org/x/net/ipv6
golang.org/x/net/icmp
I'm happy if buildbots can support to run tests in x/net with administrator privilege eventually.
I'm going to just delete that test for now, then. You can re-enable it later when you identify how the test is broken.
Kernel is 3.19.1-181, FWIW.
Subscribing, I'm from the Scaleway team
CL https://golang.org/cl/10090 mentions this issue.
CL https://golang.org/cl/10134 mentions this issue.
CL https://golang.org/cl/17476 mentions this issue.