haskell/cabal

Support relocatable packages

bos opened this issue · 29 comments

bos commented

(Imported from Trac #469, reported by @dcoutts on 2009-01-20)

This is both useful and doable on both Windows and Unix systems.

We already have limited support for relocatable (also sometimes called prefix-independent) packages on Windows. Specifically, a relocatable package is often what people want when they want to prepare a package for redistribution, especially on Windows for use with an installer. What is required is that an exe and all supporting files (data files and potentially also other shared libs) exist together in one directory hierarchy and that that directory can be placed anywhere in the filesystem and the exe will still be able to find all its associated support files.

There are some restrictions on preparing a relocatable package. In particular the user must configure it such that all install directories for the various kinds of files (libdir, datadir etc) are relative to the $prefix.

Configuring a relocatable package should be something that is done explicitly. At the moment it is done simply by configuring it in the right way, but Cabal is never aware that the user is trying to construct a relocatable package. If it is explicit then we can take different actions if necessary (as it is on unix) and do additional checks.

So a configure option --relocatable or something should be used. Installing a relocatable package is somewhat of an exception because it involves using a prefix which would otherwise be empty. Creating an image for a relocatable package would use a empty $prefix.

The tricky issues that have to be addressed are:

  • data files for the program itself
  • data files in library dependencies
  • shared libraries
The mechanism for finding data files at runtime is the Paths_pkgname module that Cabal generates. On Windows this can make use of the Win32 api that lets a .exe program discover where it was run from. On unix there is no reliable equivalent but the next best alternative is to use wrapper scripts to set environment variables before running the real program.

This mechanism can work pretty well for data files that belong to the executable. It is rather harder for data files in dependent packages (see #459). One option would be to copy the data files belonging to dependent packages and install them along with the executable (checking for clashes). This would involve being able to locate the data files of installed libraries which would require extending the data stored for installed packages. Libraries that require data files are relatively rare (though this should be checked) so this feature could be punted for an initial version.

In future there will also be the issue of shared libraries to consider. That is, when Haskell implementations can produce Haskell libs as shared libraries. This complicates the construction of relocatable packages. Again the approach taken on Windows and Unix will have to be different. On Windows it will be possible to copy the .dll files into the same directory as the .exe file (or a subdir if using .manifest files). On Unix a wrapper script using LD_LIBRARY_PATH will probably be necessary.

bos commented

(Imported comment by @dcoutts on 2009-01-20)

Note that the mechanism to support relocatable packages can and should also be used to let us run programs from the build tree and still enable them to find their data files in the build tree.

The use of environment variables in the current Paths_pkgname module already allow this use, at least on Unix. We just need to take advantage of it by generating wrapper scripts.

This would also be nice for supporting running programs with runhaskell (eg for hugs or ghc).

bos commented

(Imported comment by @bos on 2009-01-20)

This would definitely be nice to have. I have a small application that I'd like to redistribute in binary form, but it needs data files, and I've no idea how I'd create a relocatable package on Linux, OS X, or Windows at the moment.

At least on OS X and Linux, I can fudge by just installing it in /usr, but I don't have any such luxury on Windows.

hvr commented

#1542 seems related

Are there any plans to do this? I've just discovered we had a local hack in our custom build system to find data files relative to an executable, and it broke with a recent change to the default datadir in Cabal. I'd love to be able to get rid of our custom hacks and use real relocatable package support. It can't be that hard, given that we already do it on Windows.

Also, getExecutablePath works well on Linux and OS X these days. Probably the custom code we have for Win32 can go away in favour of a call to this.

I don't think there's anyone who has time to do it, but on the other hand it shouldn't be that hard for an adventurous coder that understands the problem. ;)

I've started working on this at: https://github.com/christiaanb/cabal

Currently you can do: cabal install --global --enable-relocatable
Noting the following:

  • Only tested on OS X 10.8, with XCode 4
  • Does not handle packages with data files
  • Correctly handles dynamically linked executables
  • Correctly handles packages to be used with GHCi, given you update your cabal config file to follow GHC's default directory hierarcy:
install-dirs global
  prefix: /opt/ghc/7.8.3
  -- bindir: $prefix/bin
  -- libdir: $prefix/lib
  libsubdir: $compiler/$pkgkey
  -- libexecdir: $prefix/libexec
  -- datadir: $prefix/share
  -- datasubdir: $arch-$os-$compiler/$pkgid
  docdir: $datadir/doc/ghc/html/libraries/$pkgid
  htmldir: $docdir
  -- haddockdir: $htmldir
  -- sysconfdir: $prefix/etc

I'll report back soon to tell you if it's possible to build a relocatable Haskell platform. Currently I have a completely relocatable GHC 7.8.3 install, with a dynamically linked cabal-install 1.21.1.0.

As of christiaanb@08f4d93 --enable-relocatable now support data-files, using getExecutablePath. Also, you don't need to adjust your cabal config file anymore.

I have now build a relocatable version of the Haskell Platform 2014.2.0.0, with:

  • Dynamically linked libraries
  • Dynamically linked executables

What I've tested until now:

  • Load libraries in ghci.
  • Build the examples of alex to ensure data-files for executables work. alex stores templates using the data-files mechanism, and loads them based on the Paths module.

I'll do some more testing, and then move on to cleaning up a bit. After that I'll start testing on Linux.

@christiaanb What's the current state of this ticket? Is this likely to make it into a cabal release in the upcoming future, and, if so, when(ish)? I would like to add some relocatable packages to the Mac GHC App (ghcformacosx/ghc-dot-app#19), and am curious whether this work would make it easy to do so.

The only thing that I need to do is have cabal exec pass the correct DYLD_LIBRARY_PATH, as you cannot currently execute a dynamically linked executable from within the build directory. The reason is that the rpaths are setup in such a way that they are relative to the dependent libraries when the executable is installed.

Additionally, I'd like to test it on Linux, as I've currently only tested in on OS X. At the moment I've setup relocatable packages to fail configuring when the platform is unsupported.

@christiaanb Very exciting, can't wait. I'll go ahead and start using this as soon as it's merged into cabal HEAD :)

@christiaanb Do relocatable packages enable a relocatable .cabal directory? I tried to do so using a relative --prefix, but this complained at me that prefix had to be absolute. Is there a workaround for this?

@gibiansky It needs an absolute prefix to know where to put things. As long as the specified --prefix is the root of stuff like --bindir and --libdir, everything will be alright. So a cabal install --user --enable-relocatable will create a relocatable .cabal directory.

@christiaanb Ah, got it, thanks. In that case, it should be safe to do something like this:

  1. Create a config.template file with prefix set to PREFIX or something like that.
  2. Wrap cabal in a script which determines its current directory at runtime, and generates a config file from the config.template where PREFIX is replaced by the current directory $(pwd).
  3. Feed cabal the new config via --config.

This way we can emulate having a relative directory --prefix, and we should be able to pick up our entire .cabal folder along with any configuration and move it. Am I understanding correctly?

@gibiansky Actually... --user is not going to work, because the package configuration database lives in .ghc, not in .cabal.

@gibiansky: The .cabal directory is already almost entirely relocatable, if you are willing to specify the --config-file option every time you use cabal-install. The specified config can contain a remote-repo-cache entry pointing to the right directory.

The only non-relocatable item in the .cabal directory is the setup-exe-cache. This was reported as issue #1234, which was closed without being resolved.

@christiaanb: I believe the issues of relocatable GHC packages and relocatable .cabal directory are separate.

With relocatable GHC packages, we will be halfway to having relocatable Cabal sandboxes, as each sandbox contains a GHC package configuration database. The cabal exec and cabal repl commands already make use of this database.

One of the remaining issue for relocatable Cabal sandboxes is the cabal.sandbox.config file, which is currently assumed to contain absolute paths for all entries, such as local-repo, logs-dir, world-file, package-db, build-summary, and prefix.

@christiaanb: Does PR#2255 being merged mean this issue is fully fixed and can be closed?

No, this issue cannot be closed. At the end of #2255 it says:

As said, this partially implements #462. To fully implement that issue, we would additionally need a deploy command. Which you would call like: cabal deploy --deploy-dir=<deploy_dir> which would collect the binaries, libraries of the dependencies (in case of a dynamically linked executable), and data-files, and turn them into a relocatable bundle at the specified directory.

@christiaanb: From the point of view of the user, what would be the difference between cabal deploy --deploy-dir=... and cabal install --prefix=...?

@mietek: cabal deploy is meant for a developer who wants to package his application for distribution to users. It would be equivalent to an .app bundle on OS X. cabal install is only something that's meant to be used by developers, not by end-users of an application that happens to be written in Haskell.

@christiaanb Are you still working on this and (if so) how close is this to landing (modulo code review)?

@BardurArantsson No, #2255 is the last I worked on it. #2255 provides a good start, I'd say it gets you 33% to 50% of achieving the goals of this (#462) issue. What is still needed:

  • A better story for data-files, see: #2255 (comment)
  • A cabal deploy command which creates something like an OS X .app bundle.
  • Update GHC so that the base packages are fully relocatable.

I don't see myself working on the above issues any time soon. What #2255 does get you is (mostly) relocatable sandboxes: http://christiaanb.github.io/posts/relocatable-sandboxes/

One way people (including myself) deal around the data-files issue is by using the file-embed package. I could imagine cabal-install use file-embed for every file in data-files and make them available via the Package_xyz.hs module. When you use file-embed, you get a [(filePath, contents)] list. This way, all data files would be directly embedded within the executable.

@mantkiew: what's the cross-platform support for file-embed, and how well does it scale? as in, is it "OK" to embed a 100 MB, or even 1GB, data-file in the executable?

I confirm that it works on Windows and Linux. I haven't tested on MacOS yet. It certainly is not a universal mechanism and there will be size limits. Besides, keeping 1GB on the heap does not make any sense. I only used it for small files.

@bos @simonmar @christiaanb
At MacPorts we’ve automated stack builds and this has been successful for packages that don’t use datadir.

However, cabal’s lack of a GNU-standard capability for DESTDIR and PREFIX to produce relocatable binaries and installs is causing major problems for package managers that need to build with Haskell development tools. E.g. see:

There’s obviously a lot of interest in the capability and the lack of it is holding back deployment of Haskell-based tools.

Are there any plans to tackle this soon?

Related:

No one is actively working on this AFAIK, but I think it'd make a good GSoC project.

The approach described by hlint's author @ndmitchell solves the issue of hardcoded cabal data-files in the binary.

All that's required is to specify the path within a file called Paths_packagename.hs, which is normally automatically generated by cabal with its own paths. Here is an automated stack build that solves the issue along with related files:

Also, setting the environment variable packagename_datadir at runtime overrides the binary's hardcoded packagename_datadir path.

Also see commercialhaskell/stack#5026.