DHCP page fault
MagnusS opened this issue · 13 comments
When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.
$ sudo xl create www.xl -c
Parsing config from www.xl
Xen Minimal OS!
start_info: 0000000000322000(VA)
nr_pages: 0x10000
shared_inf: 0x40a67000(MA)
pt_base: 0000000000325000(VA)
nr_pt_frames: 0x5
mfn_list: 00000000002a2000(VA)
mod_start: 0x0(VA)
mod_len: 0
flags: 0x0
cmd_line:
stack: 0000000000260800-0000000000280800
Mirage: start_kernel
MM: Init
_text: 0000000000000000(VA)
_etext: 0000000000151fde(VA)
_erodata: 0000000000190000(VA)
_edata: 0000000000247a10(VA)
stack start: 0000000000260800(VA)
_end: 00000000002a12dc(VA)
start_pfn: 32d
max_pfn: 10000
Mapping memory range 0x400000 - 0x10000000
setting 0000000000000000-0000000000190000 readonly
skipped 1000
MM: Initialise page allocator for 3ab000(3ab000)-10000000(10000000)
MM: done
Demand map pfns at 10001000-0000002010001000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0000000010001000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
getenv(TMPDIR) -> null
getenv(TEMP) -> null
Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
DHCP: start discovery
Sending DHCP broadcast len 552
Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0
Page fault in pagetable walk (access to invalid memory?).
What's the output of opam list -i
?
On 10 Nov 2014, at 22:04, Magnus Skjegstad notifications@github.com wrote:
When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.
$ sudo xl create www.xl -c
Parsing config from www.xl
Xen Minimal OS!
start_info: 0000000000322000(VA)
nr_pages: 0x10000
shared_inf: 0x40a67000(MA)
pt_base: 0000000000325000(VA)
nr_pt_frames: 0x5
mfn_list: 00000000002a2000(VA)
mod_start: 0x0(VA)
mod_len: 0
flags: 0x0
cmd_line:
stack: 0000000000260800-0000000000280800
Mirage: start_kernel
MM: Init
_text: 0000000000000000(VA)
_etext: 0000000000151fde(VA)
_erodata: 0000000000190000(VA)
_edata: 0000000000247a10(VA)
stack start: 0000000000260800(VA)
_end: 00000000002a12dc(VA)
start_pfn: 32d
max_pfn: 10000
Mapping memory range 0x400000 - 0x10000000
setting 0000000000000000-0000000000190000 readonly
skipped 1000
MM: Initialise page allocator for 3ab000(3ab000)-10000000(10000000)
MM: done
Demand map pfns at 10001000-0000002010001000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0000000010001000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
getenv(TMPDIR) -> null
getenv(TEMP) -> null
Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
DHCP: start discoverySending DHCP broadcast len 552
Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0
Page fault in pagetable walk (access to invalid memory?).
—
Reply to this email directly or view it on GitHub #80.
$ opam list -i
# Installed packages for system:
base-bigarray base Bigarray library distributed with the OCaml compiler
base-bytes legacy Bytes compatibility library distributed with ocamlfind
base-no-ppx base A pseudo-library to indicate lack of extension points support
base-threads base Threads library distributed with the OCaml compiler
base-unix base Unix library distributed with the OCaml compiler
base64 1.0.0 Base64 encoding and decoding library
camlp4 4.01.0 Camlp4 is a system for writing extensible parsers for programming languages
cmdliner 0.9.5 Declarative definition of command line interfaces for OCaml
cohttp 0.12.0 HTTP library for Lwt, Async and Mirage
conduit 0.6.1 Network connection library for TCP and SSL
conf-pkg-config 1.0 Virtual package relying on pkg-config installation.
crunch 1.3.0 Convert a filesystem into a static OCaml module
cstruct 1.4.0 access C structures via a camlp4 extension
dns 0.11.0 DNS client and server implementation
fieldslib 109.20.03 Syntax extension to define first class values representing record fields, to get and set record fields, iterate and fold over
io-page 1.1.1 Allocate memory pages suitable for aligned I/O
ipaddr 2.5.0 IP (and MAC) address representation library
lwt 2.4.6 A cooperative threads library for OCaml
mirage 2.0.0 The Mirage library operating system
mirage-clock-unix 1.0.0 A Mirage-compatible Clock library for Unix
mirage-clock-xen 1.0.0 A Mirage-compatible Clock library for Xen
mirage-conduit 2.0.0 Virtual package for the Mirage Conduit transports
mirage-console 2.0.0 A Mirage-compatible Console library for Xen and Unix
mirage-dns 2.0.0 Virtual package for the Mirage DNS transports
mirage-http 2.0.0 Mirage HTTP client and server driver for Unix
mirage-net-unix 1.1.1 Ethernet network driver for Mirage, using tuntap
mirage-net-xen 1.1.3 Ethernet network device driver for Mirage/Xen
mirage-types 2.0.0 Module type definitions for Mirage-compatible applications
mirage-types-lwt 2.0.0 Lwt module type definitions for Mirage-compatible applications
mirage-unix 2.0.0 Mirage OS library for Unix compilation
mirage-xen 2.0.0 Mirage OS library for Xen compilation
mirage-xen-minios 0.4.1 Xen MiniOS guest operating system library
oasis 0.4.5 Architecture for building OCaml libraries and applications
ocaml-data-notation 0.0.11 Store data using OCaml notation
ocamlfind 1.5.5 A library manager for OCaml
ocamlify 0.0.1 Include files in OCaml code
ocamlmod 0.0.7 Generate OCaml modules from source files
ocplib-endian 0.7 Optimised functions to read and write int16/32/64 from strings and bigarrays, based on new primitives added in version 4.01.
optcomp 1.6 Optional compilation with cpp-like directives
ounit 2.0.0 Unit testing framework loosely based on HUnit. It is similar to JUnit, and other XUnit testing frameworks
re 1.2.2 RE is a regular expression library for OCaml
sexplib 111.13.00 Library for serializing OCaml values to and from S-expressions
shared-memory-ring 1.1.0 Shared memory rings for RPC and bytestream communications.
ssl 0.4.7 Bindings for OpenSSL
stringext 1.0.0 Extra string functions for OCaml
tcpip 2.0.1 Userlevel TCP/IP stack
tuntap 1.0.0 TUN/TAP bindings
type_conv 111.13.00 Library for building type-driven syntax extensions
uri 1.7.2 RFC3986 URI/URL parsing library
vchan 2.0.0 Xen Vchan implementation
xen-evtchn 1.0.5 Xen event channel bindings.
xen-gnt 2.0.0 Xen grant table bindings
xenstore 1.2.5 Xenstore protocol clients and server
xenstore_transport 0.9.4 Low-level libraries for connecting to a xenstore service on a xen host.
could you run 'gdb ' and 'dis 151b17' to find out where it faulted (thats the RIP instruction pointer)
On 10 Nov 2014, at 22:04, Magnus Skjegstad notifications@github.com wrote:
When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.
$ sudo xl create www.xl -c
Parsing config from www.xl
Xen Minimal OS
!start_info: 0000000000322000(VA)
nr_pages: 0x10000
shared_inf: 0x40a67000(MA)
pt_base: 0000000000325000(VA)
nr_pt_frames: 0x5
mfn_list: 00000000002a2000(VA)
mod_start: 0x0(VA)
mod_len: 0
flags: 0x0
cmd_line:
stack: 0000000000260800-0000000000280800
Mirage: start_kernel
MM: Init
_text: 0000000000000000(VA)
_etext: 0000000000151fde(VA)
_erodata: 0000000000190000(VA)
_edata: 0000000000247a10(VA)
stack start: 0000000000260800(VA)
_end: 00000000002a12dc(VA)
start_pfn: 32d
max_pfn: 10000
Mapping memory range 0x400000 - 0x10000000
setting 0000000000000000-0000000000190000
readonlyskipped 1000
MM: Initialise page allocator
for 3ab000(3ab000)-10000000(10000000
)
MM:
doneDemand map pfns at 10001000-0000002010001000.
Initialising timer interface
Initialising console ...
done
.
gnttab_table mapped at 0000000010001000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -null
getenv(CAMLRUNPARAM) -null
Unsupported
function lseek called in
Mini-OS kernel
Unsupported
function lseek called in
Mini-OS kernel
Unsupported
function lseek called in
Mini-OS kernel
getenv(OCAMLRUNPARAM) -null
getenv(CAMLRUNPARAM) -null
getenv(TMPDIR) -null
getenv(TEMP) -null
Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
Manager: connect
Attempt to open(/dev/urandom)
!Manager: configuring
DHCP: start discoverySending DHCP broadcast len 552
Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0
Page fault
in pagetable walk (access to invalid memory?).
—
Reply to this email directly or view it on GitHub.
disas says 0x151b17 is in memmove
time for some printf debugging to narrow down where the fault is occurring...probably in the dhcp code in mirage-tcpip
After doing some more testing it turns out that static IP doesn't work either. Interestingly, the static IP kernel only seems to crash after it has received (or tried to reply to) two IP packets. It crashes with TCP SYNs on closed and open ports and with ICMP packets. ARP seems to work fine.
gdb disas reports that the page faults are in caml_tcpip_ones_complement (ICMP) and caml_tcpip_ones_complement_list (TCP SYN).
If I edit lib/tcpip_checksums.ml to use caml_ones_complement and caml_ones_complement_list (not caml_tcpip_*) that fixes the problem.
If I replace mirage-tcpip/lib/checksums_stubs.c with mirage-platform/xen/runtime/xencaml/checksum_stubs.c and rename the C functions to caml_tcpip_* the kernel still crashes.
It would be good to take this binary image and run it on a real Xen box to determine if it's a vbox specific problem or not.
On 12 Nov 2014, at 10:25, Magnus Skjegstad notifications@github.com wrote:
After doing some more testing it turns out that static IP doesn't work either. Interestingly, the static IP kernel only seems to crash after it has received (or tried to reply to) two IP packets. It crashes with TCP SYNs on closed and open ports and with ICMP packets. ARP seems to work fine.
gdb disas reports that the page faults are in caml_tcpip_ones_complement (ICMP) and caml_tcpip_ones_complement_list (TCP SYN).
If I edit lib/tcpip_checksums.ml to use caml_ones_complement and caml_ones_complement_list (not caml_tcpip_*) that fixes the problem.
If I replace mirage-tcpip/lib/checksums_stubs.c with mirage-platform/xen/runtime/xencaml/checksum_stubs.c and rename the C functions to caml_tcpip_* the kernel still crashes.
—
Reply to this email directly or view it on GitHub #80 (comment).
@MagnusS what's the difference in the disassembly of the two versions of ones_complement?
I don't have access to a real Xen server at the moment, but I installed the older Ubuntu 14.04 w/Xen in vbox and ran the same examples. The gcc in 14.04 is older - 4.8 vs 4.9 in 14.10. The unikernels compiled in Ubuntu 14.04 works without page fault in both 14.04 and 14.10.
As caml_ones_complement_checksum (which works) is from libxencaml.a and caml_tcpip_ones_complement_checksum (which doesn't work) is from libtcpip_stubs.a, I checked if there were differences in how the libraries were compiled. The only flag used to compile checksum_stubs.c in libtctip_stubs.a is -O2
. The flags used for libxencaml.a are (without -D/U/I/W etc) -O3 -mno-red-zone -fno-tree-loop-distribute-patterns -fno-stack-protector -fno-reorder-blocks -fstrict-aliasing -m64 -fno-asynchronous-unwind-tables -momit-leaf-frame-pointer -mfancy-math-387
.
I compiled libtcpip_stubs.a with the flags above in Ubuntu 14.10 and the DHCP and static IP versions of static_website now seem to work without page fault.
I guess the -mno-red-zone
is the most likely cause (I'm not sure how Mini-OS on x86 handles the stack).
@avsm what prevents normal OCaml code from assuming a red zone? Do we just hope that ocamlopt doesn't do that?
Yes, we absolutely must compile with no red zone on MiniOS/x86_64, since it doesn't work when the whole application is running in a privileged ring.
On 17 Nov 2014, at 11:35, Thomas Leonard notifications@github.com wrote:
I guess the -mno-red-zone is the most likely cause (I'm not sure how Mini-OS on x86 handles the stack).
—
Reply to this email directly or view it on GitHub #80 (comment).
I can confirm that the page fault is fixed in 14.10 with -mno-red-zone and -fno-stack-protector. Ubuntu patches gcc to enable stack protector by default: https://wiki.ubuntu.com/Security/Features