commercialhaskell/stack

Add `stack build --prefix` option

Opened this issue · 38 comments

This new stack build option will add a post-build step that rebuilds the targets with a cabal configure --prefix= option in a separate dist directory. This will allow installing packages that use data-files in a fixed location on the file system outside of any stack-controlled sandbox.

In order to support installation into directories where the user doesn't have write permission (e.g., /usr/local), should support a flag to specify a command to prefix the Setup.hs copy step with (e.g., sudo). Before starting, check whether writing to the destination works, and if it does not, display a friendly error with a suggestion to use the prefixing option.

Make sure this covers the case outlined in #1262 (comment)

Seems like this issue is cropping up for multiple people. Bumping to P2

Important use case for supporting a --prefix option: with it, it would be feasible for Homebrew to build their packages using Stack (for those packages that have a stack.yaml), which would aid reproducibility. See Homebrew/homebrew-core#1480 and Homebrew/legacy-homebrew#49158 for related discussion.

Here's a more concrete example for Homebrew: Homebrew/homebrew-core#1630

Further rationale—sorry if already mentioned in some links: clearing snapshots (to recover space) leaves compiled binaries around, but not their data-files.

Many packages have data-files that aren't actually required for the package to work (like READMEs and Changelogs), and since this feature would require rebuilding those packages and all packages that depend on them (which could be a lot of packages), it should have a way to skip some or all dependencies.

More demand for this feature, bumping priority.

jgm commented

I don't understand why this hasn't been fixed by now. It is a serious problem if stack can't be used to install data files to a location outside the stack work directory. Cabal has been able to do this from the beginning. Is there some serious conceptual problem with supporting it for stack?

@jgm I think the reason this hasn't been resolved is that the data-files mechanism seems pretty clunky. I've never used it, and I don't think most of the stack developers use it.

I have before started on implementing this because folks seem to care about it, but I really didn't like how cludgey it'd be to need to build your package twice just to deal with a change of dist dir:

This new stack build option will add a post-build step that rebuilds the targets with a cabal configure --prefix= option in a separate dist directory.

Perhaps it is possible to instead just do one build with a different dist dir? Any downsides? IIRC there was some issue with using a dist dir that's not under the package dir on windows..

I don't think there is any fundamental difficulty here. It'd be great if someone that wants this feature implemented it.

jgm commented

This would be very useful for me as well. My use case is installing a standard library alongside a compiler. stack install puts the compiler executable in ~/.local/bin, but data-files references an absolute path in .stack-work, which may be deleted or changed. It would be very surprising if your compiler installation stopped working because you did a git pull; stack build, causing the old compiler version to reference a newer, possibly incompatible standard library version. I can work around this with Cabal for now, but it’s not ideal for me to have to provide different build instructions for development and installation.

Ran into this limitation jgm/gitit#599 while attempting to package gitit in the Arch User Repository. I wasn't able to find another workaround besides switching back to cabal-install.

@borsboom @mgsloan any updates here? This is still something that would be useful for Homebrew. Some dependency rot made me sad today so I thought I'd ask.

No updates to report. Indeed this issue seems to be a popular request, hopefully it will get addressed. PRs appreciated!

Here's a question (prompted by #4857): is this useful for anything except where the datafiles are placed?

While --data-dir seems indeed the most common and critical problem (and the only one which I've had in practice), the same issue seems to apply for basically every --*dir option of cabal configure, except maybe --bin* and --lib* which are special for stack.
E.g. --docdir and --htmldir, and probably --haddockdir, all come to mind.

I notice the once that the ones you list that you care about don't include the actual compiled libraries themselves, only "meta" information (docs, data, etc.). In that case, I don't think it really matters if all dependencies are also installed in the prefix directory. Would something similar to what I proposed in #4857 (comment), except for most/all of the --*dir options (instead of only --data-dir) work? So --prefix would just build the targets with the different prefix, but not worry about any of their dependencies (which would remain in the snapshot or .stack-work directory).

So --prefix would just build the targets with the different prefix, but not worry about any of their dependencies (which would remain in the snapshot or .stack-work directory).

Practically you want to build all targets and dependencies, destroot the stuff you need (binaries, config files, data, etc.), then discard the .stack and .stack-work directories.

Compiled packages shouldn’t need to carry around the directories in which they were built.

In MacPorts the destroot stage takes care of managing all the stuff that will end up in a --prefix directory, and for the install to work none of that stuff can point to the discarded build/destroot directories.

This approach works for a lot of packages built with stack, but not ones that use a data directory, e.g. hlint.

Related:

I got here via a link from another discussion, so pardon me if this is a non-sequitur, but I just wanted to mention that it is possible to relocate any file without rebuilding it if you build it properly in the first place. Under UNIX repeated / characters have no effect, so you can pad install dir names with / and then install-time edit the files to change the path to whatever is needed, as long as the total length including padding does not change.

The only undesirable might be that error messages involving path names might have some extra / in them.

Windows should be able to use the same approach, though perhaps with a different pad char -- I'm not so familiar with how Haskell installs on such machines.

@G8EjlKeK7CwVQP2acz2B Makes some sense and sounds easy to prototype, but possibly fragile given how many abstraction layers it breaks... How robust is this trick in practice? Does, e.g., any Linux distro rely on it, and how many packages need special handling because of corner cases? I also wouldn't trust all the layers of code involved to not strip repeated /s.

@essandess It sounds to me like #4857, which is just about installing the data files elsewhere, may cover the data files issue for you, if that's the only thing you'd need --prefix for.

@G8EjlKeK7CwVQP2acz2B While that's a cool hack, it does sound like making this work cross-platform could be a significant amount of trouble. I don't think it's something that Stack should consider doing itself, since there are non-hack ways to make executables with different paths.

@borsboom It’s fundamentally about the necessity of a GNU-like/compliant DESTDIR capability for staged installs.

  1. PREFIX (or --prefix) tells the compiler to build everything for the ultimate destination.
  2. DESTDIR tells the compiler to install its builds in this (typically temporary, staged) location.

A package manager like MacPorts or Brew can then copy everything from DESTDIR to PREFIX, or archive everything in DESTDIR, to track and control what’s been installed, and how to uninstall or upgrade.

Without both DESTDIR and PREFIX options, there’s no reliable way for the package manager to manage its packages. E.g. see MacPorts’s destroot stage in https://guide.macports.org/chunked/reference.phases.html.

Sometime it’s possible to hack around this limitation (ihaskell example with cabal build), but it’s pretty awful.

@essandess Is cabal-install (the cabal command) capable of doing this the way you want? Because cabal is (among other things) a fairly direct interface to the Cabal library, which is the same underlying build library that Stack uses. So if cabal can't do it, then Stack wouldn't be able to either without changes made to the underlying library.

@essandess From looking at the ihaskell example, it looks like you're mostly concerned with where the executable and data files go (and that cabal isn't able to arrange things the way you want either). That still sounds more like #4857 to me, which is about putting data files somewhere outside the build directory. Solving that will also involve deciding where that is, and an option will almost certainly be provided to override it (where the binaries go is set by --local-bin-path, but there's potential to add something like --destroot to set the base of both).

If there's more to it than just setting where the executables and data files go, please be specific about exactly which other files should go in the destroot (see all the --*dir options in cabal configure --help's output for the possibilities).

@essandess Oh one other thing: since you're talking about an intermediate staging area, note that executables will have the staged location of the data files hardcoded, which may not be what you want (it looks like you have some symlink workarounds the ihaskell example to compensate for this). Unfortunately this is a limitation to how data files work, but is out of scope for Stack to fix. In fact, the Stack team recommends against using Cabal data files at all, since it makes location portability of executables so difficult (instead, read-only data can be embedded directly into the executable using file-embed).

@borsboom

executables will have the staged location of the data files hardcoded

This is the issue. This is what GNU DESTDIR is for. Package managers need to copy from DESTDIR builds into PREFIX with all dependencies hardcoded for PREFIX, not the temporary build location.

You’re correct that the entire Haskell stack—ghc, cabal, etc. —is missing this basic GNU DESTROOT capability for package managers, which is a shame because there are some great Haskell packages.

Another example for the backflips required to destroot ghc: macports/macports-ports#4699

@borsboom @snoyberg @mgsloan @Blaisorblade
At MacPorts we’ve automated stack builds and this has been successful for packages that don’t use datadir.

However, stack’s and cabal’s lack of a GNU-standard capability for DESTDIR and PREFIX to produce relocatable binaries and installs is causing major problems for package managers that need to build with Haskell development tools. E.g. see:

There’s obviously a lot of interest in the capability and the lack of it is holding back deployment of Haskell-based tools.

Are there any plans to tackle this soon?

Related:

@borsboom

@essandess Oh one other thing: since you're talking about an intermediate staging area, note that executables will have the staged location of the data files hardcoded, which may not be what you want (it looks like you have some symlink workarounds the ihaskell example to compensate for this). Unfortunately this is a limitation to how data files work, but is out of scope for Stack to fix. In fact, the Stack team recommends against using Cabal data files at all, since it makes location portability of executables so difficult (instead, read-only data can be embedded directly into the executable using file-embed).

What about packages that explicitly specify cabal’s data-files option? Take hlint as an example. Heres’s hlint.cabal:
https://github.com/ndmitchell/hlint/blob/ef26ffcbd0425b98bcc5b330b310df9264a31add/hlint.cabal#L17-L26

When stack is used to build this package, what step causes the breakage of the temporary build directory being hardcoded into the binary?

How can this be fixed when stack is used to build such a package?

@jgm
I see that pandoc.cabal has a ton of data-files, https://github.com/jgm/pandoc/blob/0e31483d4358a6d2b4ba96c71237e3f7b32979a1/pandoc.cabal#L42-L179, yet the pandoc binary that stack builds is free of this issue.

Would you please provide any tips or recipes for converting a package to use embedded files and avoid this issue? Is it relatively easy? Would it be reasonable to ask any packages like hlint that have this problem to refactor their code? What’s the recipe to do that? I see https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Data.hs, but I don’t understand how you’ve turned off cabal’s data-files In pandoc.

Hearing your experience about how to fix this issue with code would be much appreciated.

This older post from @ndmitchell appears to provide at least a couple straightforward solutions to cabal‘s hardcoded path problem:

  1. Create a Paths_mypackagename.hs file that monkeypatches the default hardcoded cabal path. If this path is relative to the binary, it would be easy to use ../share/mypackage_datadir or such like.
  2. One comment mentions that cabal will use the environment variable mypackage_datadir to set the datadir paths encoded in the executable.

Does anyone have any experience or pointers with this cabal behavior? I’d like to start barking up the right tree.

cc: @borsboom @snoyberg @mgsloan @Blaisorblade @jgm @acfoltzer @bubba @phadej @typedrat @23Skidoo @bos @simonmar @christiaanb

I did once implement partial support of "relocatable" packages in Cabal: haskell/cabal#2255; it was sufficient to get relocatable Cabal sandboxes: http://qbaylogic.com/blog/2016/05/08/relocatable-sandboxes.html

@christiaanb

I did once implement partial support of "relocatable" packages in Cabal: haskell/cabal#2255; it was sufficient to get relocatable Cabal sandboxes: http://qbaylogic.com/blog/2016/05/08/relocatable-sandboxes.html

Thank you. Does the --enable-relocatable address the problem with hardcoded paths to cabal data-files? If it does, it’s not clear to me.

@christiaanb’s post brings us to three possible solutions:

  1. Create a Paths_mypackagename.hs file that monkeypatches the default hardcoded cabal path. If this path is relative to the binary, it would be easy to use ../share/mypackage_datadir or such like.
  2. One comment mentions that cabal will use the environment variable mypackage_datadir to set the datadir paths encoded in the executable.
  3. cabal --enable-relocatable

@borsboom @snoyberg Would any of these solutions work within stack?

If any of these do work, I could personally modify specific ports of stack-based packages, or even MacPorts automated process for stack builds, but this issue goes above specific packages built on macOS.

If cabal provides this capability, then stack should support it to provide relocatable binaries.

jgm commented

@essandess - pandoc has an embed_data_files cabal flag, which is enabled by default in the stack build. That avoids the issue. This flag causes all the data files to be embedded as bytestring blobs in the binary, making the executable portable. The file-embed package is used for this. See Text.Pandoc.Data.

Of course, this is not the ideal solution when pandoc is installed by a package manager. In that case it's usually good practice to have the data files live separately in the file hierarchy, where they can be inspected and replaced. (This is how debian linux installs of pandoc work, for example.) Unfortunately, stack doesn't currently support this because of the lack of --prefix.

The approach described by hlint's author @ndmitchell solves the issue of hardcoded cabal data-files in the binary.

All that's required is to specify the path within a file called Paths_packagename.hs, which is normally automatically generated by cabal with its own paths. Here is an automated stack build that solves the issue along with related files:

Also, setting the environment variable packagename_datadir at runtime overrides the binary's hardcoded packagename_datadir path.

In contrast, setting datadir in stack.yaml causes stack to actually try to write to that directory during compilation, which breaks GNU DESDIR capability for package managers. I believe that this inconsistent and breaking behavior is a bug; see #5026.

@borsboom @snoyberg @mgsloan @Blaisorblade @jgm @acfoltzer @bubba @phadej @typedrat @23Skidoo @bos @simonmar @christiaanb
Thank you all for all your longstanding help, comments, and pointers about this issue. I believe that the approach identified by @ndmitchell is sufficient for at least BSD package managers to include stack builds of packages that use cabal's data-files.

Just wanted to note: AIUI Cabal (the library) actuall does support both DESTDIR and PREFIX. PREFIX is passed via configure's --prefix option while DESTDIR is passed via copys --destdir option.