Add `stack build --prefix` option
Opened this issue · 38 comments
This new stack build
option will add a post-build step that rebuilds the targets with a cabal configure --prefix=
option in a separate dist
directory. This will allow installing packages that use data-files
in a fixed location on the file system outside of any stack-controlled sandbox.
In order to support installation into directories where the user doesn't have write permission (e.g., /usr/local
), should support a flag to specify a command to prefix the Setup.hs copy
step with (e.g., sudo
). Before starting, check whether writing to the destination works, and if it does not, display a friendly error with a suggestion to use the prefixing option.
Make sure this covers the case outlined in #1262 (comment)
Seems like this issue is cropping up for multiple people. Bumping to P2
Important use case for supporting a --prefix
option: with it, it would be feasible for Homebrew to build their packages using Stack (for those packages that have a stack.yaml
), which would aid reproducibility. See Homebrew/homebrew-core#1480 and Homebrew/legacy-homebrew#49158 for related discussion.
Here's a more concrete example for Homebrew: Homebrew/homebrew-core#1630
Further rationale—sorry if already mentioned in some links: clearing snapshots (to recover space) leaves compiled binaries around, but not their data-files.
Many packages have data-files
that aren't actually required for the package to work (like READMEs and Changelogs), and since this feature would require rebuilding those packages and all packages that depend on them (which could be a lot of packages), it should have a way to skip some or all dependencies.
More demand for this feature, bumping priority.
I don't understand why this hasn't been fixed by now. It is a serious problem if stack can't be used to install data files to a location outside the stack work directory. Cabal has been able to do this from the beginning. Is there some serious conceptual problem with supporting it for stack?
@jgm I think the reason this hasn't been resolved is that the data-files mechanism seems pretty clunky. I've never used it, and I don't think most of the stack developers use it.
I have before started on implementing this because folks seem to care about it, but I really didn't like how cludgey it'd be to need to build your package twice just to deal with a change of dist dir:
This new stack build option will add a post-build step that rebuilds the targets with a cabal configure --prefix= option in a separate dist directory.
Perhaps it is possible to instead just do one build with a different dist dir? Any downsides? IIRC there was some issue with using a dist dir that's not under the package dir on windows..
I don't think there is any fundamental difficulty here. It'd be great if someone that wants this feature implemented it.
This would be very useful for me as well. My use case is installing a standard library alongside a compiler. stack install
puts the compiler executable in ~/.local/bin
, but data-files
references an absolute path in .stack-work
, which may be deleted or changed. It would be very surprising if your compiler installation stopped working because you did a git pull; stack build
, causing the old compiler version to reference a newer, possibly incompatible standard library version. I can work around this with Cabal for now, but it’s not ideal for me to have to provide different build instructions for development and installation.
Ran into this limitation jgm/gitit#599 while attempting to package gitit in the Arch User Repository. I wasn't able to find another workaround besides switching back to cabal-install.
No updates to report. Indeed this issue seems to be a popular request, hopefully it will get addressed. PRs appreciated!
Here's a question (prompted by #4857): is this useful for anything except where the datafiles are placed?
While --data-dir
seems indeed the most common and critical problem (and the only one which I've had in practice), the same issue seems to apply for basically every --*dir
option of cabal configure
, except maybe --bin*
and --lib*
which are special for stack.
E.g. --docdir
and --htmldir
, and probably --haddockdir
, all come to mind.
I notice the once that the ones you list that you care about don't include the actual compiled libraries themselves, only "meta" information (docs, data, etc.). In that case, I don't think it really matters if all dependencies are also installed in the prefix directory. Would something similar to what I proposed in #4857 (comment), except for most/all of the --*dir
options (instead of only --data-dir
) work? So --prefix
would just build the targets with the different prefix, but not worry about any of their dependencies (which would remain in the snapshot or .stack-work
directory).
This also prevents some stack
builds in MacPorts. See:
So
--prefix
would just build the targets with the different prefix, but not worry about any of their dependencies (which would remain in the snapshot or.stack-work
directory).
Practically you want to build all targets and dependencies, destroot the stuff you need (binaries, config files, data, etc.), then discard the .stack
and .stack-work
directories.
Compiled packages shouldn’t need to carry around the directories in which they were built.
In MacPorts the destroot
stage takes care of managing all the stuff that will end up in a --prefix
directory, and for the install to work none of that stuff can point to the discarded build/destroot directories.
This approach works for a lot of packages built with stack
, but not ones that use a data directory, e.g. hlint
.
Related:
I got here via a link from another discussion, so pardon me if this is a non-sequitur, but I just wanted to mention that it is possible to relocate any file without rebuilding it if you build it properly in the first place. Under UNIX repeated / characters have no effect, so you can pad install dir names with / and then install-time edit the files to change the path to whatever is needed, as long as the total length including padding does not change.
The only undesirable might be that error messages involving path names might have some extra / in them.
Windows should be able to use the same approach, though perhaps with a different pad char -- I'm not so familiar with how Haskell installs on such machines.
@G8EjlKeK7CwVQP2acz2B Makes some sense and sounds easy to prototype, but possibly fragile given how many abstraction layers it breaks... How robust is this trick in practice? Does, e.g., any Linux distro rely on it, and how many packages need special handling because of corner cases? I also wouldn't trust all the layers of code involved to not strip repeated /
s.
@essandess It sounds to me like #4857, which is just about installing the data files elsewhere, may cover the data files issue for you, if that's the only thing you'd need --prefix
for.
@G8EjlKeK7CwVQP2acz2B While that's a cool hack, it does sound like making this work cross-platform could be a significant amount of trouble. I don't think it's something that Stack should consider doing itself, since there are non-hack ways to make executables with different paths.
@borsboom It’s fundamentally about the necessity of a GNU-like/compliant DESTDIR
capability for staged installs.
PREFIX
(or--prefix
) tells the compiler to build everything for the ultimate destination.DESTDIR
tells the compiler to install its builds in this (typically temporary, staged) location.
A package manager like MacPorts or Brew can then copy everything from DESTDIR
to PREFIX
, or archive everything in DESTDIR
, to track and control what’s been installed, and how to uninstall or upgrade.
Without both DESTDIR
and PREFIX
options, there’s no reliable way for the package manager to manage its packages. E.g. see MacPorts’s destroot
stage in https://guide.macports.org/chunked/reference.phases.html.
Sometime it’s possible to hack around this limitation (ihaskell
example with cabal
build), but it’s pretty awful.
@essandess Is cabal-install (the cabal
command) capable of doing this the way you want? Because cabal
is (among other things) a fairly direct interface to the Cabal library, which is the same underlying build library that Stack uses. So if cabal
can't do it, then Stack wouldn't be able to either without changes made to the underlying library.
@essandess From looking at the ihaskell
example, it looks like you're mostly concerned with where the executable and data files go (and that cabal
isn't able to arrange things the way you want either). That still sounds more like #4857 to me, which is about putting data files somewhere outside the build directory. Solving that will also involve deciding where that is, and an option will almost certainly be provided to override it (where the binaries go is set by --local-bin-path
, but there's potential to add something like --destroot
to set the base of both).
If there's more to it than just setting where the executables and data files go, please be specific about exactly which other files should go in the destroot (see all the --*dir
options in cabal configure --help
's output for the possibilities).
@essandess Oh one other thing: since you're talking about an intermediate staging area, note that executables will have the staged location of the data files hardcoded, which may not be what you want (it looks like you have some symlink workarounds the ihaskell
example to compensate for this). Unfortunately this is a limitation to how data files work, but is out of scope for Stack to fix. In fact, the Stack team recommends against using Cabal data files at all, since it makes location portability of executables so difficult (instead, read-only data can be embedded directly into the executable using file-embed).
executables will have the staged location of the data files hardcoded
This is the issue. This is what GNU DESTDIR
is for. Package managers need to copy from DESTDIR
builds into PREFIX
with all dependencies hardcoded for PREFIX
, not the temporary build location.
You’re correct that the entire Haskell stack—ghc
, cabal
, etc. —is missing this basic GNU DESTROOT
capability for package managers, which is a shame because there are some great Haskell packages.
Another example for the backflips required to destroot ghc
: macports/macports-ports#4699
@borsboom @snoyberg @mgsloan @Blaisorblade
At MacPorts we’ve automated stack
builds and this has been successful for packages that don’t use datadir
.
However, stack
’s and cabal
’s lack of a GNU-standard capability for DESTDIR
and PREFIX
to produce relocatable binaries and installs is causing major problems for package managers that need to build with Haskell development tools. E.g. see:
- macports/macports-ports#4706
- ndmitchell/hlint#699
- macports/macports-ports#5050 (comment)
- macports/macports-ports#5180
There’s obviously a lot of interest in the capability and the lack of it is holding back deployment of Haskell-based tools.
Are there any plans to tackle this soon?
Related:
@essandess Oh one other thing: since you're talking about an intermediate staging area, note that executables will have the staged location of the data files hardcoded, which may not be what you want (it looks like you have some symlink workarounds the
ihaskell
example to compensate for this). Unfortunately this is a limitation to how data files work, but is out of scope for Stack to fix. In fact, the Stack team recommends against using Cabal data files at all, since it makes location portability of executables so difficult (instead, read-only data can be embedded directly into the executable using file-embed).
What about packages that explicitly specify cabal
’s data-files
option? Take hlint
as an example. Heres’s hlint.cabal
:
https://github.com/ndmitchell/hlint/blob/ef26ffcbd0425b98bcc5b330b310df9264a31add/hlint.cabal#L17-L26
When stack
is used to build this package, what step causes the breakage of the temporary build directory being hardcoded into the binary?
How can this be fixed when stack
is used to build such a package?
@jgm
I see that pandoc.cabal
has a ton of data-files
, https://github.com/jgm/pandoc/blob/0e31483d4358a6d2b4ba96c71237e3f7b32979a1/pandoc.cabal#L42-L179, yet the pandoc
binary that stack
builds is free of this issue.
Would you please provide any tips or recipes for converting a package to use embedded files and avoid this issue? Is it relatively easy? Would it be reasonable to ask any packages like hlint
that have this problem to refactor their code? What’s the recipe to do that? I see https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Data.hs, but I don’t understand how you’ve turned off cabal’s data-files
In pandoc.
Hearing your experience about how to fix this issue with code would be much appreciated.
This older post from @ndmitchell appears to provide at least a couple straightforward solutions to cabal
‘s hardcoded path problem:
- Create a
Paths_mypackagename.hs
file that monkeypatches the default hardcoded cabal path. If this path is relative to the binary, it would be easy to use../share/mypackage_datadir
or such like. - One comment mentions that cabal will use the environment variable
mypackage_datadir
to set the datadir paths encoded in the executable.
Does anyone have any experience or pointers with this cabal
behavior? I’d like to start barking up the right tree.
cc: @borsboom @snoyberg @mgsloan @Blaisorblade @jgm @acfoltzer @bubba @phadej @typedrat @23Skidoo @bos @simonmar @christiaanb
I did once implement partial support of "relocatable" packages in Cabal: haskell/cabal#2255; it was sufficient to get relocatable Cabal sandboxes: http://qbaylogic.com/blog/2016/05/08/relocatable-sandboxes.html
I did once implement partial support of "relocatable" packages in Cabal: haskell/cabal#2255; it was sufficient to get relocatable Cabal sandboxes: http://qbaylogic.com/blog/2016/05/08/relocatable-sandboxes.html
Thank you. Does the --enable-relocatable
address the problem with hardcoded paths to cabal data-files
? If it does, it’s not clear to me.
@christiaanb’s post brings us to three possible solutions:
- Create a
Paths_mypackagename.hs
file that monkeypatches the default hardcoded cabal path. If this path is relative to the binary, it would be easy to use../share/mypackage_datadir
or such like. - One comment mentions that cabal will use the environment variable
mypackage_datadir
to set the datadir paths encoded in the executable. cabal --enable-relocatable
@borsboom @snoyberg Would any of these solutions work within stack
?
If any of these do work, I could personally modify specific ports of stack-based packages, or even MacPorts automated process for stack builds, but this issue goes above specific packages built on macOS.
If cabal
provides this capability, then stack
should support it to provide relocatable binaries.
@essandess - pandoc has an embed_data_files
cabal flag, which is enabled by default in the stack build. That avoids the issue. This flag causes all the data files to be embedded as bytestring blobs in the binary, making the executable portable. The file-embed
package is used for this. See Text.Pandoc.Data.
Of course, this is not the ideal solution when pandoc is installed by a package manager. In that case it's usually good practice to have the data files live separately in the file hierarchy, where they can be inspected and replaced. (This is how debian linux installs of pandoc work, for example.) Unfortunately, stack doesn't currently support this because of the lack of --prefix
.
The approach described by hlint's author @ndmitchell solves the issue of hardcoded cabal
data-files
in the binary.
All that's required is to specify the path within a file called Paths_packagename.hs
, which is normally automatically generated by cabal
with its own paths. Here is an automated stack
build that solves the issue along with related files:
- https://github.com/macports/macports-ports/blob/e18fe88466477b20da595346ebce8eab839b9fbc/www/adblock2privoxy/Portfile
- https://github.com/macports/macports-ports/blob/e18fe88466477b20da595346ebce8eab839b9fbc/www/adblock2privoxy/files/Paths_adblock2privoxy.hs
Also, setting the environment variable packagename_datadir
at runtime overrides the binary's hardcoded packagename_datadir
path.
In contrast, setting datadir
in stack.yaml
causes stack
to actually try to write to that directory during compilation, which breaks GNU DESDIR
capability for package managers. I believe that this inconsistent and breaking behavior is a bug; see #5026.
@borsboom @snoyberg @mgsloan @Blaisorblade @jgm @acfoltzer @bubba @phadej @typedrat @23Skidoo @bos @simonmar @christiaanb
Thank you all for all your longstanding help, comments, and pointers about this issue. I believe that the approach identified by @ndmitchell is sufficient for at least BSD package managers to include stack
builds of packages that use cabal
's data-files
.
Just wanted to note: AIUI Cabal (the library) actuall does support both DESTDIR
and PREFIX
. PREFIX
is passed via configure
's --prefix
option while DESTDIR
is passed via copy
s --destdir
option.