snowleopard/hadrian

Demystifying Autoconf

snowleopard opened this issue ยท 49 comments

I've been told a number of times that Autoconf is a sacred gift, which we should all love and live with until the end of time. I also heard people whispering heretic thoughts about getting rid of it. Not now, but maybe some day. I admit that Autoconf is a mystery for me: I don't know what exactly it does, which makes it difficult for me to take sides in this matter.

Can we try to demystify Autoconf here? What exactly does it do? Please try to be specific, i.e., it computes setting foo stored in file bar, which is difficult to get right on all platforms, because baz.

When I look at system.config.in, I don't see anything magic. Most of the settings could be obtained in a relatively straightforward way. But this is, of course, only part of what Autoconf and friends do. Several GHC packages have non-trivial configure scripts.

In any case, it seems logical to at least move the parts which are easy to compute into the new build system. For example:

ghc-version           = @GhcVersion@
ghc-major-version     = @GhcMajVersion@
ghc-minor-version     = @GhcMinVersion@
ghc-patch-level       = @GhcPatchLevel@

We can easily add a build rule to query GHC version, so why keep this in the incomprehensible and unmaintainable (to mere mortals like me) Autoconf codebase?

My personal feeling is that Autoconf is big, single-threaded and complex. It does have a lot of domain-specific knowledge, but much of that knowledge was useful 20 years ago when you had lots of compilers and OS's. Now you have 2 compilers and 3 OS's (to a crude first approximation), and most of them are pretty similar. I'd be a fan of pushing some/most of that knowledge in Shake.

hvr commented

...just a quick reply to this one:

We can easily add a build rule to query GHC version, so why keep this in the incomprehensible and unmaintainable (to mere mortals like me) Autoconf codebase?

You can of course replicate parsing of ghc --numeric-version, but we need this this information inside aclocal.m4+configure.ac anyway. ./configure is the first thing the user is supposed to invoke in a source-distribution, so it should fail early if it detects that the environment doesn't provide the necessary tools in order to give the user a chance to either tell ./configure on the next invocation where the required tool can be found or install the tool if it's missing.

One of these environment checks validates that the bootstrapping GHC is within a certain version range (and yes, this is one of the exceptions where we don't test for versions not for features). So it doesn't necessarily help us if shakebuild does this check, as shakebuild may not even be compiled yet at that point!

The obvious way to understand Autoconf is to RTFM: http://www.gnu.org/software/autoconf/manual/autoconf.html

I will summarize, since it's a long document:

  • "Autoconf is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages"
  • Typical configuration scripts have the following structure; ghc is no exception
     Autoconf requirements
     AC_INIT (package, version, [bug-report], [tarname], [url])
     information on the package
     checks for {programs,libraries,header files,types,structures,compiler characteristics,library functions,system services}
     AC_CONFIG_FILES([file...], [commands], [init-cmds])
     AC_OUTPUT
  • The created shell script reads a lot of environment variables and substitutes a lot of output variables, typically in the files listed in AC_CONFIG_FILES. It also supports a lot of command-line options, which vary depending on the autoconf script but can be listed with ./configure --help.
  • There are a lot of checks available, listed here, which set various output variables. They run specific commands on specific files and observe their output. Most of them are deprecated or specializations of a different check. GHC mostly uses its own fptools macros, you can see what's used.
  • The full list of output variables can be extracted by running autoconf --trace=AC_SUBST. GHC has 234 variables, but they all look pretty simple.
  • The configuration is cached between invocations in autom4te.cache. There is also support for using a system-wide cache, which speeds up configuration significantly. The cache looks pretty simple, compared to Shake's: command, output, traced macro invocations.
  • There's no restrictions on usage of the output /configure script. So you could run autoconf once for GHC and its modules, port the scripts line-by-line to Haskell/Shake, and forget about using autoconf.
  • Normally, when not run in the special tracing mode mentioned above, /configure generates a config.status script, which in turn generates header files, makefiles, and symlinks, based on the output variables, and also runs other configure scripts.
  • M4 is a persnickety whitespace-sensitive macro language which has used up hundreds of man-years of debugging time. Autoconf helpfully includes a library of M4 functions / macros.
hvr commented

@Mathnerd314 That seems to be a good overview :-)

Btw, there's also http://www.gnu.org/software/autoconf-archive/ which is a repository of contributed Autoconf scripts, as well as plenty of Autoconf-based projects all over the internet where man-years went into and where one can steal scripts from...

Yeah, the manual's introduction is also mandatory reading (it mentions the autoconf archive among other things):
https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Introduction.html

And the history section is enlightening; it started as a simple shell script ('configure'), then as more projects needed configure scripts, turned into a configure-script-generator using M4. Then, as it expanded even more, and M4 started crashing due to implementation bugs, they fixed the bugs in GNU M4 (instead of rewriting it in a sane language such as Haskell).

@hvr @Mathnerd314 I appreciate your valuable input -- the above links will definitely be my go-to resource when I need to make sense of Autoconf scripts.

However, let me reiterate: can we make this thread a bit more specific? Do you know of a particular setting computed by Autoconf which would be difficult to compute in Shake? If yes, please share it, so we can discuss specifics.

You can of course replicate parsing of ghc --numeric-version, but we need this this information inside aclocal.m4+configure.ac in anyway.

I don't think this is a good argument in favour of Autoconf. Computing GHC version directly by Shake brings good benefits: type safety, reliable rebuilds, etc. So, we don't need Autoconf here?

One of these environment checks validates that the bootstrapping GHC is within a certain version range

Aha, this is an interesting point. Indeed, we might want to do this check outside the Shake script, as the latter may fail to compile with an old GHC. Maybe we can request a particular GHC version from within Shake sources using CPP?

@hvr

where man-years went into one...

@Mathnerd314

And the history section is enlightening...

I don't think that man-years or history of Autoconf can be used as arguments in this discussion. This whole project is about getting rid of the glorious make-based build system, which also has an enlightening history and also is a product of many man-years of the GHC community. Yet it has been decided to rewrite it using Shake for very good reasons. So far I haven't seen a really convincing argument why the same shouldn't happen with Autoconf.

Indeed, we might want to do this check outside the Shake script, as the latter may fail to compile with an old GHC.

I'd hope the Cabal file for this could handle these things. More generally I'd hope that while GHC may still be complex to build, the GHC shake infra is the most mundane Haskell executable to build and doesn't therefore need any special support.

man-years

I read the statements as the amount of blood sweat and tears that went into such a convoluted mess being eye-opening and head-scratching. Not an argument for keeping it around.

Definitely +1 no more autotools from me. I guess its main use both with GHC itself and other things (like base) is back-end concerns such as integrating with binutils. But with https://ghc.haskell.org/trac/ghc/ticket/11470 I'd hope (somewhat soon) that wouldn't be such a build-time concern for GHC anyways.

Maybe my summary was too long. The important line is this:

There's no restrictions on usage of the output /configure script. So you could run autoconf once for GHC and its modules, port the scripts line-by-line to Haskell/Shake, and forget about using autoconf.

Once you have generated the configure script, autoconf is not relevant. Only the 15k-line configure script is relevant, and familiarity with autoconf does not really help you with understanding said shell script, as most of that code is actually from the aclocal.m4 file in GHC, and the rest is undocumented boilerplate. But, once that script is ported to Haskell, you can forget about the bash script too. And most of the code could end up in the Shake library or a supporting shake-autoconf-compat.
Improvements to autoconf would change the configure script, of course, and maybe you would have to look at those, but the last autoconf release was 4 years ago so it's a moot point.

hvr commented

Do you know of a particular setting computed by Autoconf which would be difficult to compute in Shake?

If you frame it that way: as was mentioned already, you can obviously translate everything that's done in ./configure line-by-line from Bash into Haskell. So by definition you can do everything in Shake that you can in Autoconf. But then we'd have to leave Autoconf behind and maintain our own home-brewed Autoconf-clone, and whenever something breaks or we need to test a new property we're gonna either have to invent it on our own, or start translating from existing Autoconf scripts, or look at Autoconf changelogs to see how it was fixed upstream and adapt that to our Haskell adaption.

Then there's the problem that shakebuild is much less convenient to transport. you can't simply scp the binary you just built locally over to a remote host (unless it's effectively the same ABI environment, but then you probably don't need to debug it anyway) which may even lack some of the tooling you need to build shakebuild from source. You'd first have to setup a crosscompile environment, and compile the binary you need for the target host. But if you're just interested to quickly compare what ./configure detects on a remote host, now you can just scp over this portable ./configure bash script that runs everywhere bash exists. And you can even edit ./configure in-place if you need to debug something. All stuff I can't imagine being able to do with a compiled Haskell binary. So here I'm worried this would add a barrier to porting and debugging issues.

There's more than just linux/windows/osx, we have active ports for e.g. Solaris, Freebsd, AIX, and a few other exotic ones. And on Linux you have various distributions (for which you can't assume ABI compabitlity) which may even differ radically, e.g. Android is Linux but it's nothing like the usual GNU/Linux distributions. And then there's Linux variants with alternative libc (musl or uclibc). And of course, OS evolve over time. And as for compilers, GCC 3 and GCC 4 and GCC 5 have significantly enough differences that we may want to probe their features. Apple clang and llvm.org clang are subtly different as well. And then there's other compiler we want to support, like IBM's xlC, maybe Intels ICC compiler, or at some point maybe even MSVC when they finally catch up to ISO C99 after over 15 years... so there's actually a bit more than "3 os and 2 compilers", and we do spend quite a bit of time making sure all those work. And updating the Autoconf test is not the where the time is spent the most, but rather testing on the target platforms (and retesting the tier1 platforms afterwards to make sure those still work...). So the trial&error part needs to convenient.

Of course, I'm biased because I've been using Autoconf over 20 years, and some may say the Stockholm syndrome may apply here. But you also need to consider who's going to keep maintaining all those platform-specific probing code, and in the remaining OSS world (including Hackage packages such as unix, network, and so on!), Autoconf is going to stay around, so if GHC moves away from Autoconf it'll just mean I have more mental context-switching to do, and less domain-specific knowledge I can directly transfer w/o impedance mismatch.

hvr commented

Improvements to autoconf would change the configure script, of course, and maybe you would have to look at those, but the last autoconf release was 4 years ago so it's a moot point.

Actually, just a few weeks ago I made made use of an Autoconf macro that wasn't used before in GHC's configure.ac. If this was Shake, this would have cost me definitely more than the half hour it took me for the initial proof of concept (which included trying it out on a few different architectures by copying around just the ./configure script).

But then we'd have to leave Autoconf behind and maintain our own home-brewed Autoconf-clone

Right, but this is not a burden. Most of autoconf is m4 code working around the limitations of m4. Projects seem to use autoconf out of inertia, not because it has any technical advantages; for example KDE migrated from autoconf to CMake, and Firefox still uses a autoconf / python / make hybrid but has plans to transition away from autoconf. They refer to autoconf as "15+ years of accumulated linear m4 and shell gunk" and "auto-hell". Leaving autoconf behind is something to celebrate.

Autoconf is going to stay around

There have been calls to murder autoconf since at least 2003; you are probably right that it will exist for a a lot longer, probably for no other reason than that GNU is larger than autoconf and they never delete old projects. (GnuTLS split away from GNU, and recently released 3.4.11, but GNU still maintains a misleading archive of old releases). Just look at this commit graph; autoconf is going nowhere.

whenever something breaks or we need to test a new property we're gonna either have to invent it on our own, or start translating from existing Autoconf scripts

GHC does that anyways, the macro set of autoconf is pretty limited. The macro archive extends that a bit but it's still focused on C/C++ programs, Haskell has a different set of requirements.

or look at Autoconf changelogs to see how it was fixed upstream

Autoconf has no upstream, there is nobody to volunteer to make a release.

I've been using Autoconf over 20 years. Who's going to keep maintaining all those platform-specific probing codes?

One word: you ๐Ÿ˜„

I don't think there are any actual maintainers left (sure, Paul Eggert makes a 2-line change every few weeks, but he is the one who said he didn't want to release a new version).

Haskell has the advantage of actually being maintainable... there is a growing set of well-maintained cross-platform libraries (e.g. directory), unlike autoconf which has to reimplement functions such as mkdir -p ("for portability!").

Then there's the problem that shakebuild is much less convenient to transport. you can't simply scp the binary you just built locally over to a remote host ... you need to build shakebuild from source... we have active ports, lots of compilers...

GHC already requires itself to build, it is not an extra dependency. And autoconf doesn't really support clang, there is some stuff in autoconf git master but it only works in GHC because it was hacked in.

With runhaskell, you do not need to compile the build system, you can just run it like any other scripting language. So Shake does not add any portability constraints or really any significant time to the edit/run/compile cycle. Haskell debugging is "tricky" but it can be done, and Shake is a build system where tracing commands is not particularly hard. For contrast, see how annoyed this guy and this guy are.

As for scp, well... I think it's even older than autoconf. Why would you use that by choice when you could use git or rsync?

in the remaining OSS world (including Hackage packages such as unix, network, and so on!),

I suspect most Hackage packages will switch to Shake, if GHC does. But world domination for Shake would have to wait until Haskell in general is more stable (Python hasn't switched to Cabal for package management, when there are obvious benefits...).

I also found this image, autotools is quite amusing if you research its userbase (18 years of hatred!):

hvr commented

Btw, I was referring to maintaining the GHC autoconf scripting/tests, not the upstream Autoconf. You clearly don't like Autoconf, that's pefectly fine. But this dislike doesn't add much to the discussion I'm afraid.

I know my way around Autoconf, as do our GHC port-maintainers. It doesn't matter if Haskell is easier to maintain if you have to leave behind all your existing knowledge you can leverage everywhere else for some new knowledge only applicable to GHC. Be it inertia or not, there are a lot of people who already know Autoconf. You'd have a better chance arguing to migrate to an Autoconf-replacement with an existing community. And btw, CMake is a terrible choice to migrate to, unless you literally have to support only a small set of target platforms. Also, can you truly claim that there's a superior "test for features not versions" replacement for Autoconf out there? I'm not aware of any Autoconf-replacement project that can be used in place of Autoconf. And if we are honest, the replacement surely won't be Haskell based, as GHC is way too heavy to be common prerequisite.

For what it's worth, we haven't even reached feature-parity of of GHC's shake-build with the old Makefile-system, and there's a lot of large and small features I need ported over from the old Makefile system that I'm worried that may prove to be tedious to implement, which will likely need quite a bit of time to get right. We're now at about 4500 SLOCs of code in shake-build (according to sloccount). And it's a bit strange we're already discussing to take on an even bigger workload which is very likely to add at least 2-3 kSLOCs of Haskell code... ;-)

With runhaskell, you do not need to compile the build system, you can just run it like any other scripting language.

The fallacy here is assuming that runhaskell is even available. runhaskell requires interpreter-support, and that's not a guaranteed.

As for scp, well... I think it's even older than autoconf. Why would you use that by choice when you could use git or rsync?

Oh come on... rsync is roughly as old as ssh. I was there... I certainly won't use git to move around binaries or generated shell-scripts, that seems awkward. But I see you're trying to label me old-school and obsolete for using ancient tooling... ;-)

I suspect most Hackage packages will switch to Shake

I don't think this proposition makes any sense at all. In fact, I rather try to move packages away from custom Setup.hs build-types to Autoconf buildtypes because Autoconf proves to cause way less package-dependency problems. But more importantly, Cabal already has its own dependency-tracking code, and it's been recently rewritten/augmented for the still evolving nix-local-build facilities. It would have made sense to base this work early on on Shake, but at this point it makes little sense to integrate Shake into Cabal IMO. Afaik, Cabal packages are designed to have mostly static dependency graphs which don't benefit much from nor require Shake's powerful tracking algorithm.


However, let's try to get back to the core discussion. One thing I'm interested in is how source & binary distrubtions of GHC are supposed to work in the future:

You wrote

GHC already requires itself to build, it is not an extra dependency.

so that applies to source-distributions of GHC. So let's start with those. How do you envision a source-tarball (which does not contain compiled executables) to work? Which additional pre-requisites will we need? Also note that you can't generally expect network access to e.g. Hackage. So you have to be quite specific what libraries/tools the build-host is supposed to provide.

This turns out to be quite a passionate discussion :-) Thanks everyone!

In another attempt to bring the discussion down to specifics, let me ask a couple of questions:

And as for compilers, GCC 3 and GCC 4 and GCC 5 have significantly enough differences that we may want to probe their features. Apple clang and llvm.org clang are subtly different as well. And then there's other compiler we want to support, like IBM's xlC, maybe Intels ICC compiler, or at some point maybe even MSVC when they finally catch up to ISO C99 after over 15 years...

Shall we use CFLAGS a good point to start looking into specifics? When I look into system.config on my Windows machine I see:

conf-cc-args-stage0 = -std=gnu99 -fno-stack-protector

How does this change across the feared multitude of compiler/platform combinations? Is the end result really that different to be afraid to recompute it on our own?

Second question: can we look at an example where feature-probing is essential for the GHC build system?

How do you envision a source-tarball (which does not contain compiled executables) to work? Which additional pre-requisites will we need?

Right, this looks like a nice specific issue to discuss. I admit I don't have a good understanding of what is required from the build system for source distributions. I presume we will need to make sure that all builders listed in system.config are present before we can build GHC. For example, we can't build GHC without a bootstrapping compiler. This is not difficult to check and in fact the build system already has some light-weight functionality to lookup builders in PATH. What else do we need?

@hvr You know much more about this, so can you list specific features we will need to implement to support source distributions without Autoconf? What are the requirements here?

I can also provide specific examples where Autoconf fails to deliver what it promises: it doesn't provide a good separation of concerns and the GHC build system is full of version/platform-specific special cases. Below I show several examples from the make-based build system, which I had to migrate to Shake, each time wondering why this is not in Autoconf. Maybe because people don't want to touch it?

  1. C compiler:
ifneq "$(GccIsClang)" "YES"

# Debian doesn't turn -Werror=unused-but-set-variable on by default, so
# we turn it on explicitly for consistency with other users
ifeq "$(GccLT46)" "NO"
# Never set the flag on Windows as the host gcc may be too old.
ifneq "$(HostOS_CPP)" "mingw32"
SRC_CC_WARNING_OPTS += -Werror=unused-but-set-variable
endif
# gcc 4.6 gives 3 warning for giveCapabilityToTask not being inlined
SRC_CC_WARNING_OPTS += -Wno-error=inline
endif

else

# Don't warn about unknown GCC pragmas when using clang
SRC_CC_WARNING_OPTS += -Wno-unknown-pragmas

endif
  1. Bootstrapping GHC:
ifeq "$(SUPPORTS_THIS_UNIT_ID)" "NO"
ifeq "$4" "0"
$4_USE_THIS_UNIT_ID=NO
endif
endif

[...]

ifeq "$($4_USE_THIS_UNIT_ID)" "NO"
$4_THIS_UNIT_ID = -this-package-key
else
$4_THIS_UNIT_ID = -this-unit-id
endif
  1. Ar, where Autoconf clearly can't help at all:
ifeq "$$($1_$2_ArSupportsAtFile)" "YES"
    $$(call cmd,$1_$2_AR) $$($1_$2_AR_OPTS) $$($1_$2_EXTRA_AR_ARGS) $$@ @$$@.contents
else
    "$$(XARGS)" $$(XARGS_OPTS) "$$($1_$2_AR)" $$($1_$2_AR_OPTS) $$($1_$2_EXTRA_AR_ARGS) $$@ < $$@.contents
endif
  1. The build system is littered with ifeq "$(Windows_Host)" "YES", for example:
ifeq "$(Windows_Host)" "YES"
# Apparently building on Windows fails when there is a system gmp
# available, so we never try to use the system gmp on Windows
libraries/integer-gmp_CONFIGURE_OPTS += --configure-option=--with-intree-gmp
endif
  1. And many, many more things like:
# Some platforms don't support shared libraries
NoSharedLibsPlatformList = \
    powerpc-ibm-aix \
    x86_64-unknown-mingw32 \
    i386-unknown-mingw32

ifeq "$(SOLARIS_BROKEN_SHLD)" "YES"
NoSharedLibsPlatformList += i386-unknown-solaris2
endif

PlatformSupportsSharedLibs = $(if $(filter $(TARGETPLATFORM),\
    $(NoSharedLibsPlatformList)),NO,YES)

So, Autoconf does a poor job at what it is supposed to do, already forcing us to partially do its job, which is my motivation to reconsider its importance for the GHC build system.

hvr commented

Thanks for listing those concrete issues. I'll try to turn them moot by coming up with proper solutions/fixes in our existing Makefile system... :-)

So, Autoconf does a poor job at what it is supposed to do, already forcing us to partially do its job, which is my motivation to reconsider its importance for the GHC build system.

This seems unfair, as you basically say that it's Autoconf's fault if people choose to take shortcuts and thus violate the recommended practice you're supposed to follow when using Autoconf. It's like claiming that seatbelts do a poor job to protect people from injuries which are not using them according to the instructions... OTOH, you can argue that nobody likes reading instructions (or following them if they're annoying/tedious) and developers follow the path of least resistance to get the job done and build up technical debt. However, porting this over to Haskell won't be a silver bullet to address this human issue either IMO.

PS: Most occurences of ..._host or ..._os conditionals are such shortcuts. I've been trying to reduce such cases as they represent code-smell IMHO.

This seems unfair, as you basically say that it's Autoconf fault if people choose to take shortcuts and thus violate the recommended practice you're supposed to follow when using Autoconf.

But wait, if Autoconf is painful to use then it is indeed its fault. Note the if: I am not qualified to make such statements myself, but this seems to be the opinion of many. And we want many people to be able to change and maintain GHC build system, not only Autoconf gurus.

Thanks for listing those concrete issues. I'll try to turn them moot by coming up with proper solutions/fixes in our existing Makefile system... :-)

Beware, that may be a big undertaking.

You are basically saying that in order to be able to hack on GHC (e.g., introduce -this-unit-id flag) one should also be familiar with Autoconf. Isn't that a too high bar for newcomers? The suggested workflow is then something like:

  • Implement -this-unit-id flag in GHC.
  • Find -this-package-key in the build system.
  • Replace it with getSetting $ ThisUnitId stage.
  • Add ThisUnitId Stage to Oracles.Config.Setting.
  • Implement support for @ThisUnitIdStageK@ in Autoconf.

With Shake this simplifies to:

  • Implement -this-unit-id flag in GHC.
  • Find -this-package-key in the build system.
  • Replace it with if version (Ghc stage) < X then "-this-package-key" else "-this-unit-id".

Isn't this a better world to live in?

hvr commented

On 2016-04-18 at 11:36:57 +0200, Andrey Mokhov wrote:

[...]

| Shall we use CFLAGS a good point to start looking into specifics? When I look into system.config on my Windows machine I see:
| makefile | conf-cc-args-stage0 = -std=gnu99 -fno-stack-protector |
| How does this change across the feared multitude of compiler/platform combinations? Is the end result really that different to be afraid to recompute it on our own?

We need this early on at ./configure time, because all subsequent tests
need to be done with that CFLAGs enabled, as enabling C99 mode may
expose/hide definitions in C header files and/or change subtle.

I know at least that the AIX compiler needs -qlanglvl=c99 instead, as
otherwise the compiler will just emit a warning(!) and the exit-code
will be zero:

$ xlc -std=gnu99 -c foo.c
/opt/IBM/xlC/13.1.2/bin/.orig/xlc: 1501-210 (W) command option t contains an incorrect subargument

so you don't even get an error here, but you really need to check that
ISO C99 was really enabled... before concluding that -std=gnu99 does
have any effect. There's lots of such subtle issues you'll have to
consider when probing for features.

| Second question: can we look at an example where feature-probing is
| essential for the GHC build system?

Let me think a bit about a good example... I'll get back to you

|| How do you envision a source-tarball (which does not contain compiled
|| executables) to work? Which additional pre-requisites will we need?
|
| Right, this looks like a nice specific issue to discuss. I admit I
| don't have a good understanding of what is required from the build
| system for source distributions. I presume we will need to make sure
| that all builders listed in system.config are present before we can
| build GHC.

I assume by "builders" you mean program/tools that generate output
possibly based on input files?

| For example, we can't build GHC without a bootstrapping
| compiler. This is not difficult to check and in fact the build system
| already has some light-weight functionality to lookup builders in
| PATH. What else do we need?
|
| @hvr You know much more about this, so can you list specific features
| we will need to implement to support source distributions without
| Autoconf? What are the requirements here?

Basically all sorts of checks you see already in configure.ac, making
sure the compiler works at all, what extensions are used (objectfiles,
lib-archives, DSOs, executables -- all things you need to interactively
probe) like testing for availability of lib/systemcalls, properties of C
header struct, sizes of pointer types, endinanness, what flags your
tools support or need, whether response files are supported, and so
on...

it's maybe easier if you look through configure.ac and just ask me why a
specific line/test is there, and why it can't be done simpler... this
may actually help identify parts which can be simplified!

hvr commented

@snowleopard btw, I may tend to agree that some few things may make sense to have version-based conditionals inside the shake-domain. Like e.g. things depending on GHCs version. But that's only because GHC is one of the few tools we control and therefore can trust its reported version (and what ghc --info tells us). So your example -this-unit-id is one I may agree with. But this doesn't preclude us from having configure.ac neeeding to know GHC's version as well in order to fail even before we attempt to build shakebuild proper! But you're arguing to move everything into the shake-domain (including all that tedious C-land stuff -- and I haven't even mentioned libtool!). That's the part I disagree with.

You clearly don't like Autoconf

Actually, I was trying to create a neutral summary; the problem is that, from what I can tell, nobody on the internet actually likes autoconf. Honestly, I was surprised. The arguments for autoconf were:

  • It was quick to set up (but we are already rewriting the build system)
  • It's used by lots of people (but autoconf usage is declining).
  • Users are familiar with configure options (but:
    • GHC's build procedure is already complex enough (installing existing GHC, editing build.mk) that people will have to read the docs regardless
    • configure's error messages are often terse and confusing.
    • A haskell rewrite could still preserve the configure command and its options)

porting this over to Haskell won't be a silver bullet

True, but are there any silver bullets left? Maybe it's a silver splinter.

It doesn't matter if Haskell is easier to maintain

The argument for Haskell:

  • Upstream autoconf is, for all intents and purposes, a dead project
    • Therefore, all maintenance burden falls upon GHC developers
  • Nobody likes maintenance burden
    • Therefore, it should be as small as possible
  • GHC developers are familiar with Haskell, but not so familiar with m4
    • Therefore, using Haskell (instead of m4) would reduce maintenance burden
  • Therefore, ditching autoconf (in favor of a Haskell equivalent) is the right thing to do

So Haskell's ease of maintenance is pretty much the key point.

the recommended practice you're supposed to follow when using Autoconf

The recommended practice, from the autoconf manual, is to use hand-crafted shell commands for especially tricky or specialized features. So, I don't think you can argue that GHC isn't using autoconf correctly; at most you can say that make isn't supposed to be used as a shell.

You could argue for adopting automake and libtool, which do have more specialized recommendations, but what's the point?

Realistically, speaking purely from experience, autoconf is not a particularly massive maintenance burden on GHC, like Make has been. It sucks, but that's difference from being an active burden. autoconf is crufty and terrible, sure, but in practice the development burden has not been so absurd as to actively frighten people. Out of all the components in GHC, I actually don't classify it as any kind of real long-term maintenance threat, at least not right now. Completely different ballpark than 'Make', which actively does pose a more real sustainability problem. And now that Shake will be in place, it will certainly be easier to separate and move some of the logic into Shake itself, I'm sure. So hopefully we can reduce dependence in the future, if not outright eliminate it.

So. If someone actually did this, I wouldn't necessarily object to it. It would have to be quite clear it is an obvious improvement (since it will rewrite thousands of lines of code, essentially), and that we can support it faithfully. I think that's quite possible to prove, in all honesty. In fact, I can even say I might like to see someone do it. (And yes, I'm sure everyone here thinks that already, to some degree, but frankly as open source maintainers we have to be convinced of it, because, practically, we get people hand wringing about all sorts of things from people who inevitably never end up contributing anyway, or people who dump significant code and then leave forever to never be heard from again. Someone randomly saying "rewrite autoconf for profit" sounds great on paper, but might not be once you're the person maintaining it for the next 5 years, and you've just replaced one big ball of code with another big ball of code that nobody else uses except you.)

But I'm not going to do it, and frankly, I wouldn't suggest any of the core developers do it right now, either. We've already probably got several months of work on our plate just to finish off the current build system, to make it completely subsume the old one and work out everything, and then get rid of the old one.[1] On top of the other commitments we all have.

Honestly, even spending time discussing replacing autoconf is just adding MORE scope creep to a project we already need to tackle. And it adds scope in a dimension that's a bit hard to immediately quantify, vs the actual work we're doing, which has extensive evidence to support its superiority. We need to finish the current build system rewrite before we think about tacking on a whole new set of functionality with a greatly expanded scope.

Therefore, IMO, I'd suggest just tabling this discussion for like, a long time away from now, when we could actually feasibly do something about it, after the task queue has about 100 other things knocked off it.

EDIT: unless, of course, someone else actually wants to, like, do this work. Which I totally support and advise you to get in contact about, as we move forward on Shake. But yeah, I wouldn't suggest any major contributors spend significant time on this, TBQH.

[1] As a side note, I'll mention this Shake rewrite is guaranteed to piss off upstream maintainers who will now have to redo any support/fixes they had for GHC's prior code, and they tend to have a particular knack for finding deficiencies of build systems and tools that claim to be better. The exact same is true of whatever autoconf thing we replace. That's unavoidable, even if I think it's absolutely the right choice. So, remember this stuff isn't always trivially free.

@thoughtpolice Your comment summarises my thoughts very well! I didn't mean to suggest we should do anything about Autoconf right now, but I wanted more experienced people to express their views on the prospect of eventually getting rid of it. Surely I must finish this project first before putting my head into another black hole :-) But if anyone would like to give this crazy idea a try I'd be happy to help.

hvr commented

Just to give some perspective, here's the history of things currently touching the autoconf-code:

You may notice I'm looking for opportunities to refactor/cleanup/simplify the current Autoconf code to get rid of some technical-debt. Ironically, the work I'm doing to improve our Autoconf script may end up making it easier to get rid of Autoconf ;-)

@hvr Just clicked on a random commit and I see this: http://git.haskell.org/ghc.git/commitdiff/75036aacb492886a7c65035127ee11fec11ee7ce?hp=c5d8162d230c373b2b49ec94d3f9a027ff6e2dd6.

--- a/compiler/ghc.mk
+++ b/compiler/ghc.mk
@@ -380,6 +380,19 @@ endif
 compiler/stage2/build/Parser_HC_OPTS += -O0 -fno-ignore-interface-pragmas -fcmm-sink
 compiler/stage3/build/Parser_HC_OPTS += -O0 -fno-ignore-interface-pragmas -fcmm-sink

+# On IBM AIX we need to wrokaround XCOFF's TOC limitations (see also
+# comment in `aclocal.m4` about `-mminimal-toc` for more details)
+# However, Parser.hc defines so many symbols that `-mminimal-toc`
+# generates instructions with offsets exceeding the PPC offset
+# addressing limits.  So we need to counter-act this via `-mfull-toc`
+# which disables a preceding `-mminimal-toc` again.
+ifeq "$(HostOS_CPP)" "aix"
+compiler/stage1/build/Parser_HC_OPTS += -optc-mfull-toc
+endif
+ifeq "$(TargetOS_CPP)" "aix"
+compiler/stage2/build/Parser_HC_OPTS += -optc-mfull-toc
+compiler/stage3/build/Parser_HC_OPTS += -optc-mfull-toc
+endif

 ifeq "$(GhcProfiled)" "YES"
 # If we're profiling GHC then we want SCCs.  However, adding -auto-all

Poor broken abstractions! And since the commit is yours I presume there was no way around this without breaking the recommended practice ;-)

P.S.: Apologies for stabbing in the back like that. Your hard work on keeping the Autoconf codebase sane is very valuable, of course. And I thank you for that, as I had to make sense of Autoconf a couple of times when understanding the build system. But the abstraction is broken and we shouldn't pretend that it can be saved by following recommended practice.

hvr commented

@snowleopard if you check again you may notice that very hack was removed again, I wasn't proud of it myself either ;)

I've started work on porting the configure script to Haskell; there are interesting design decisions available. For example, where should the 68 small C / C++ files used for testing the compiler be stored? Presumably, there will be more later, since a lot of the configure script is platform-based rather than feature-based.

@Mathnerd314 My first reaction is, why not use withTempFile?

The problem is that some of them are in fact reasonably long, such as the C99 conformance test. Autoconf just stores them all in a big c.m4 file, but storing them in Haskell would require a lot of escaping or raw string literals. Or they could be stored as separate files, and either included in the binary at compile time (again using Template Haskell) or read at runtime from some directory.

The actual runtime behavior is not so interesting; withTempFile would work but I've just been using the existing behavior of overwriting conftest.c for every test.

Ah, I see. We could probably store test files separately in some folder, e.g. cfg/tests, and compile them when need be. I'd rather not rely on TH for this...

@Mathnerd314 I don't want to be a barrier to exploration, but the reference to Template Haskell in your comment rather gave me chills. This is a place where we really cannot afford to use Template Haskell (or really even slightly "controversial" GHC extension).

Keep in mind that configure needs to be buildable on the target with nothing more than a half-way functional unregisterised stage1 compiler. Template Haskell is completely unavailable in this scenario. hadrian itself may have a bit more wiggle room as it is possible to cross-compile from the host but configure is probing the target and therefore must produce an artifact that may run on the target. Bringing up GHC on a new target is already a bear of a task; please let's not make it harder but putting up barriers at the very first step of the process.

There are middle roads I can see here; among them are writing the configure replacement in Haskell but have it produce an interpreted bash or Python script. However, at this point you have now IMHO lost many of the "simplicity" arguments that were made in favor of replacing autoconf to begin with. Another option would be to simply write the configure replacement in another widely-available language directly; this I wouldn't be opposed to but would again ask whether the benefits outweigh the costs.

Keep in mind that configure needs to be buildable on the target with nothing more than a half-way functional unregisterised stage1 compiler. Template Haskell is completely unavailable in this scenario.

@bgamari Don't get me wrong, I'm not arguing for using TH here. But we are building TH in Stage0, so Stage1 GHC does already depend on TH. Could you clarify what you mean by a "half-way functional unregisterised stage1 compiler"?

In general, it would be great if we had a list of what we can and can't use. Can we use Shake, for example? It already depends on a bunch of libraries and GHC extensions.

Andrey Mokhov notifications@github.com writes:

Keep in mind that configure needs to be buildable on the target with nothing more than a half-way functional unregisterised stage1 compiler. Template Haskell is completely unavailable in this scenario.

@bgamari Don't get me wrong, I'm not arguing for using TH here. But we
are building TH in Stage0, so Stage1 GHC does already depend on TH.
Could you clarify what you mean by a "half-way functional
unregisterised stage1 compiler"?

By this I mean you might have an unregisterised compiler build which
doesn't have interpreter support and therefore does not support TH.
Moreover, the compiler may not be entirely reliable, so the less we need
to depend upon it the better.

ARM was in this exact state until about three months ago and had been
that way for years: the RTS was broken enough that one couldn't rely on
interpreted code not to fall over randomly; moreover various code
generation issues mean that sometimes you ended up with an various
unexpected build failures. My experience in working on ARM issues is one
of the reasons why I'm rather concerned about the adding a Haskell
dependency in configure: bringing up new platforms is hard and yet we
are about to put up yet another barrier.

In general, it would be great if we had a list of what we can and
can't use. Can we use Shake, for example? It already depends on a
bunch of libraries and GHC extensions.

hadrian itself is fine IMHO as you can always cross-compile. The issue
is when you touch configure as it is literally the first thing you
need to run when you attempt to build GHC and it must be run on the target.

To expand on what @bgamari said, with Hadrian there are really 3 compilers involved. There's the StageH compiler (used to build Hadrian), the Stage0 compiler (used by Hadrian to compile Stage1) and the Stage1 compiler which Hadrian builds. We expect in most cases StageH and Stage0 will be the same, but there's certainly no requirement for that, and in particular for bootstrapping up a new architecture you can cross-compile Hadrian.

However, I think the architecture @Mathnerd314 is suggesting is that Hadrian would have configure built-in, and then it gets compiled with StageH, and thus use of things like Template Haskell are absolutely fine. The configure must be run on the target, so the things @Mathnerd314's code runs can't involve running local Template Haskell, but embedding the C files through Template Haskell into Hadrian doesn't seem problematic. (I'm not necessarily endorsing Template Haskell here - as a general principle I dislike Template Haskell - I just don't think it would be problematic if it was used.)

hvr commented

@bgamari

hadrian itself is fine IMHO as you can always cross-compile.

You can't assume that either. E.g. there's no sane way to cross-compile for IBM AIX (there are some essential toolchain tools you can only run natively on AIX... it was a very frustrating process getting this to the point where I could finally build GHC on AIX natively). Moreover, the GHC port for AIX uses NCG now, but doesn't support GHCi nor -XTemplateHaskell. (@ndmitchell, so that unfortunately means that you can't use TH for the build-system at all)

I don't even remember if shake currently works properly for unregisterised GHCs (I tried when AIX was still an unregisterised port, but I don't remember if I ran into issues when I shortly tried; but one thing that doesn't work for sure for unregisterised GHCs is full -threaded support)

the GHC port for AIX doesn't support GHCi nor -XTemplateHaskell.

I guess you would know, having done a lot of recent AIX work, but are you sure? There's this old comment from Simon Marlow:

GHCi should work on all platforms (even unregisterised) these days, including the FFI if there's support in libffi for that platform.

hvr commented

@Mathnerd314 Not sure what @simonmar's comment refers to, but here's the relevant logic from the existing build-system:

# Whether to include GHCi in the compiler.  Depends on whether the RTS linker
# has support for this OS/ARCH combination.

OsSupportsGHCi=$(strip $(patsubst $(TargetOS_CPP), YES, $(findstring $(TargetOS_CPP), mingw32 linux solaris2 freebsd dragonfly netbsd openbsd darwin kfreebsdgnu)))
ArchSupportsGHCi=$(strip $(patsubst $(TargetArch_CPP), YES, $(findstring $(TargetArch_CPP), i386 x86_64 powerpc powerpc64 powerpc64le sparc sparc64 arm aarch64)))

ifeq "$(OsSupportsGHCi)$(ArchSupportsGHCi)" "YESYES"
GhcWithInterpreter=YES
else
GhcWithInterpreter=$(if $(findstring YES,$(DYNAMIC_GHC_PROGRAMS)),YES,NO)
endif

And right now, the AIX port has neither support for the RTS linker (we'd need to understand AIX' XCOFF object format) nor for using the (dynamic) system linker (different issues with XCOFF's limitations).

@hvr I found similar functionality in Hadrian:

ghcWithInterpreter :: Action Bool
ghcWithInterpreter = do
    goodOs <- anyTargetOs [ "mingw32", "cygwin32", "linux", "solaris2"
                          , "freebsd", "dragonfly", "netbsd", "openbsd"
                          , "darwin", "kfreebsdgnu" ]
    goodArch <- anyTargetArch [ "i386", "x86_64", "powerpc", "sparc"
                              , "sparc64", "arm" ]
    return $ goodOs && goodArch

Not a complete match as I guess I copied this too long ago.

hvr commented

@snowleopard yeah, that's the kind of logic which is error-prone & awkward when implemented in Makefile... but then again, I'd argue that this doesn't belong into the build-system phase but rather into the configuration phase, i.e. configure-time. The current somewhat arbitrary division of what goes into mk/build.mk and what's controllable via configure always struck me a bit as inconsistent. Some of the mk/build.mk settings clearly belong into ./configure traditionally, like selecting the integer backend. But that's another low-priority thing on my todo list I'm going to tackle when I'm really bored (i.e. not soon) :-)

Not having configure will make some extra work for me. This is not necessarily a blocker, but just to make you aware: our build system framework provides a bunch of standard configure options that are passed straight through to GHC's configure script. These include:

  • installation locations: --prefix, --bindir, etc.
  • CC, CFLAGS, LDFLAGS (actually I currently have to collect these and put them in build.mk, but @hvr is planning to make configure handle them I believe). Some of these flags are not optional, ie. if we forget to plumb them somewhere then the C compiler just doesn't work at all or produces binaries that won't run.

I'm probably not alone in having to fit GHC into some generic build framework (OS packagers are similar, for example). If we got rid of configure it would be great to have some kind of compatibility layer that understood the standard configure options and put them in the right place for Hadrian.

On a general note I'm slightly scared of all the work that would be involved in replicating everything that autoconf does. It's truly horrible, but I'm not sure that getting rid of it is worth the effort. But then, I'm not the one volunteering the effort :)

hvr commented

There's also another concern I haven't mentioned regarding the typical workflow of running ./configure, inspecting its output (and generated files such as config.h), tweak ./configure flags and repeat this step until everything looks as expected, maybe even directly edit settings in config.h, and only then proceed to the build-phase. This is the workflow that ought to be convenient to practice also in the future unless you want me unhappy ;-)

PS: and yes, I'm planning to finally cleanup the configure handling of the "precious" variables CC/CFLAGS/LDFLAGS to better adhere to Autoconf-idioms. I already started with CC, and CFLAGS/LDFLAGS are next.

There's also another concern I haven't mentioned regarding the typical workflow of running ./configure, inspecting its output (and generated files such as config.h), tweak ./configure flags and repeat this step until everything looks as expected.

@hvr You can do it the following way:

$ hadrian/build hadrian/cfg/system.config
# Inspect the output of configure and tweak its flags in Settings/User.hs.
$ hadrian/build hadrian/cfg/system.config
# Inspect the new output.
# The build system is clever enough to spot your changes and rerun configure with new flags.
# Once you are happy, continue by building the default target:
$ hadrian/build

Any issues with the above?

A phony for configure would make it even easier.

@ndmitchell Indeed! With a phony rule we get hadrian/build configure. Very nice.

@hvr What do you think? If you like this idea I can add the configure rule.

@ndmitchell Oh, wait! configure is a valid target already -- the build system builds file ./configure by running boot. How will the phony rule interact with this? Can we do build ./configure to build the file and build configure to run the phony rule?

Overloading by subtle path differences is a bad idea, and almost certain to go wrong. Just rename the phony to conf :)

Let me close this issue. We've had a good discussion above, clarifying the role of Autoconf, but it's unlikely that anyone will be attempting to rewrite it soon.

I'm copying a relevant comment by @Mistuke from this reddit thread:

That seems quite literally impossible to me. I'm disappointed in that in that whole discussion thread no one ever mentioned the fact that all that work that configure does is to make sure you have a compatible environment.

Hadrian is a haskell program. Which means it needs a working GHC, which means it needs a compatible platform linker and c compiler. It also needs compatible versions of shared libs such as pthreads and other things configure checks for.

It checks for things that we absolutely need to know, such as if the OS is LLP64, I32LLP64 or ILL32P64 etc, these differences means you can't even build binaries for Hadrian to cover them all unless you're in the package manager.

It checks to see if your system headers contain the functionality you expect, because distros sometimes have incompatible definitions inside a system header or its too old. It allows us to black list broken dependencies before you even start building. Without these checks you have no way of knowing if what you produce from GHC will even do what you think it does.

Without configure you have no way of knowing if Hadrian will even run, let alone if GHC will even run, given the various architectures and distros have different default flags for the C Compilers and different versions of shared libs. Sometimes the shared libs aren't even compatible but carry the same name (e.g. Common practice on macos where Apple tends to reinvent the wheel).

No, i quite literally do not see how you can replace a script with no external dependencies, or rather a highly portable script, with a something with a dependency from here to the moon.

And personally I find Shake/Hadrian to be more complex than any other build system I've ever used because of this type safety thing. It forced me to jump through hoops to do even the most simple stuff, and I have yet to see a return on this complexity other than academic.

Want to know why configure is still around even though everyone (including me) hates it? Because it works and is dependable and very portable.