tweag/asterius

Q: how are .cmm files in the boot corpus to be treated?

ggreif opened this issue · 5 comments

When running cargo exec -- ahc-boot on the Mac, the build chokes on the first cmm file in base, namely cbits/CastFloatWord.cmm with an error like

/var/folders/39/lr50t0q96tx5qp0mzht34r240000gn/T/ghc92440_0/ghc_4.s:1:15: error:
     error: unexpected token in '.section' directive
  |
1 | .section .text
  |               ^
.section .text
              ^

This strikes me as odd, because I would expect that ahc won't produce native (x86_64 here) artefacts. Or will it?

Boot.hs seems also to do some massaging of cmm files that happens before boot.sh runs. Commenting out cbits/CastFloatWord.cmm from base.cabal, makes the booting process to pass base, just to choke in ghc-heap with another .cmm file.

Have you recently booted the libraries in macOS? I would be interested in verbose build logs from a successful run. Something like from

-ahc-cabal act-as-setup --build-type=Configure -- build -j --builddir=$ASTERIUS_TMP_DIR/dist/base
+ahc-cabal act-as-setup --build-type=Configure -- build -j --builddir=$ASTERIUS_TMP_DIR/dist/base -v --ghc-options=-v

(and maybe the versions of the tools involved) for differential diagnosis.

Shamelessly pinging @TerrorJack...

This is how HeapPrim.cmm is assembled to a regular stage-1 build:

*** C Compiler:
/nix/store/q0bf1jzqd89ai17ycx9nalr35m4ircdc-clang-wrapper-7.1.0/bin/cc -DTABLES_NEXT_TO_CODE -E -Ilibraries/ghc-heap/dist-boot/build -Ilibraries/ghc-heap/dist-boot/build/./autogen -Ilibraries/ghc-heap/. -I/nix/store/6gq75lkpsr8gzdjsyg4vxiphim3474y5-libiconv-osx-10.12.6/include -I/nix/store/kad2r4i74bcrwi53sdbh3mf85n7qbcaf-ghc-8.8.3/lib/ghc-8.8.3/base-4.13.0.0/include -I/nix/store/7agp7xpvknyrmmsaw5yqdjjlv06pzc8m-gmp-6.2.0-dev/include -I/nix/store/kad2r4i74bcrwi53sdbh3mf85n7qbcaf-ghc-8.8.3/lib/ghc-8.8.3/integer-gmp-1.0.2.0/include -I/nix/store/kad2r4i74bcrwi53sdbh3mf85n7qbcaf-ghc-8.8.3/lib/ghc-8.8.3/include -I/nix/store/mhqpmbkjvvc93yrdb7qq9vrz1vjyh3b5-libffi-3.3-dev/include -include /nix/store/kad2r4i74bcrwi53sdbh3mf85n7qbcaf-ghc-8.8.3/lib/ghc-8.8.3/include/ghcversion.h -Ddarwin_BUILD_OS -Dx86_64_BUILD_ARCH -Ddarwin_HOST_OS -Dx86_64_HOST_ARCH -D__GLASGOW_HASKELL_TH__ -U__PIC__ -D__PIC__ -D__SSE__ -D__SSE2__ -include/var/folders/39/lr50t0q96tx5qp0mzht34r240000gn/T/ghc35469_0/ghc_2.h -x assembler-with-cpp libraries/ghc-heap/cbits/HeapPrim.cmm -o /var/folders/39/lr50t0q96tx5qp0mzht34r240000gn/T/ghc35469_0/ghc_1.cmm
*** ParseCmm [/var/folders/39/lr50t0q96tx5qp0mzht34r240000gn/T/ghc35469_0/ghc_1.cmm]:
!!! ParseCmm [/var/folders/39/lr50t0q96tx5qp0mzht34r240000gn/T/ghc35469_0/ghc_1.cmm]: finished in 1.44 milliseconds, allocated 1.103 megabytes

==================== Asm code ====================
.text
.align 3
.globl aToWordzh
aToWordzh:
_c2:
	jmp *(%rbp)



==================== Asm code ====================
.text
.align 3
.globl reallyUnsafePtrEqualityUpToTag
reallyUnsafePtrEqualityUpToTag:
_c5:
	andq $-8,%rbx
	andq $-8,%r14
	cmpq %r14,%rbx
	sete %al
	movzbl %al,%ebx
	jmp *(%rbp)


*** Assembler:
/nix/store/q0bf1jzqd89ai17ycx9nalr35m4ircdc-clang-wrapper-7.1.0/bin/cc -DTABLES_NEXT_TO_CODE -Ilibraries/ghc-heap/dist-boot/build -Ilibraries/ghc-heap/dist-boot/build/./autogen -Ilibraries/ghc-heap/. -fno-common -U__PIC__ -D__PIC__ -Qunused-arguments -x assembler -c /var/folders/39/lr50t0q96tx5qp0mzht34r240000gn/T/ghc35469_0/ghc_4.s -o libraries/ghc-heap/dist-boot/build/cbits/HeapPrim.o

Note that the first directive is .text and not .section .text. I guess there is some Darwin specific tweak to adapt to the used assembler, that is activated in ghc, but missing from ahc.

Here is a definitive difference:

[nix-shell:~/asterius]$ stack exec -- ahc --info | grep GNU
 ,("ld is GNU ld","YES")
 ,("target has GNU nonexec stack","True")

[nix-shell:~/asterius]$ stack exec -- ghc --info | grep GNU
 ,("ld is GNU ld","NO")
 ,("target has GNU nonexec stack","False")

The ahc overrides reside in ghc-toolkit/ghc-libdir/settings. Trying to tweak them...

Editing ghc-toolkit/ghc-libdir/settings and re-ahc-booting seems to do the trick.

I guess these should be populated with the values inherited from the underlying ghc.

Closing, as I'll try to come up with a PR.