haskell/cabal

lock / freeze dependencies

Closed this issue · 18 comments

This feature is required for reliable builds for application builders (it does not help library developers) as per:
http://blog.docmunch.com/blog/2013/haskell-version-lockdown

discussion on reddit revealed there has been some thought to this already:
http://www.reddit.com/r/haskell/comments/1m2bkp/cabal_version_lockdown_reliable_builds_for/

The sticking point right now is that there is no reliable way for cabal to re-write out a constraints field particularly since users are supposed to be able to add and modify constraints.

The solution that makes the most sense to me is that there should be a new field in .cabal or cabal.config that helps freeze dependencies (possibly just freeze: True) or possibly a cabal command, and that a new field or a separate file should contain the frozen dependencies. The file containing the frozen dependencies must be one that can be checked into version control.

I'd be interested in seeing something like this too.

We cannot have a freeze: True (or anything else) in the .cabal file. This is a property of building the package after all.

The problem is really bootstrapping the cabal.config file once. After you have a complete list of dependencies you really want to manually edit it to change any constraints (i.e. changing dependencies should be a conscious choice for a piece of released software.) I think what we need is either a flag to some existing command that causes a cabal.config file to be dumped (or perhaps updated) or a new command.

Since I'm trying to get us away from manually having to run configure all the time in favor of just running build or test and have Cabal do the right thing, I think this should probably be a flag of install. Perhaps cabal install --only-dependencies --freeze-dependencies.

I don't want to manually change frozen constraints in cabal.config. It is very important that we are all clear that freezing means dependencies of dependencies of dependencies until you get down to base.

For example. I have what I would consider a small code base and it has a total recursive dependency list of 160 different packages.
What I want to do is manually change constraints in my .cabal file and have cabal pick out new versions for me.

This is how things are done with Ruby's Gemfile and Node's package.json (npm does not have a lock file concept, but you are encouraged to just check in all your dependencies to achieve locking).
So the only difference between the initial bootstrap and a later update is that we want to be more cautious in the later update, but cabal already is so that should be solved ok enough for now.

If a user types cabal install --only-dependencies --freeze-dependencies then that will both install dependencies and write out all used dependencies to constraints.
The question is how do we make cabal install --only-dependencies respect those frozen constraints? The answer right now is use the constraints field in cabal.config, however that is falling short right now as mentioned previously because it is not designed just for automated freezing.

Why don't you want to manually update the list of dependencies? The goal is to not have your dependencies change under you unexpectedly once you've released your software to your users. That means that you shouldn't bump dependencies unless you tested that the new version works well. So the workflow would be something like:

Before you release the first version:

cabal install --only-dependencies --freeze-dependencies  # generates a cabal.config
git add cabal.config
git commit -m "Added cabal.config"

Then whenever you want to update a dependency:

  1. Test new version of foo.
  2. Change e.g. foo == 1.0 to foo == 1.1 in cabal.config.
  3. Check in new version of cabal.config into your source repo.

Changing the .cabal file is the wrong thing to do. It lists which versions the library could possible work with (e.g. both foo-1.0 and foo-1.1) and not which versions you've decided to ship with; that's what cabal.config (which is the analogue of the gem lockfile) is for.

There are 2 different, mutually exclusive use cases: application developrs and library authors.

freezing is for application developers it has no value in an end product of a library author. There is no situation where someone would both distribute a library and a canonical frozen package list.

That being said, library authors will find some development use cases for a frozen package list. For example if they are debugging a dependency issue and they switch to a new computer they might actually check in frozen dependencies on a branch, but when they merge to master they need to remove the frozen dependency list.

One thing I think we can do is not allow frozen dependencies for a Library section, but only for executable, test, and bench. Obviously if a library author is using test & bench they should not check in frozen dependencies for those.

This conversation will be easier if we all agree that by default it is about application developers. If we want to talk about library authors we need to make sure we make that clear.

So I think the workflow listed is invalid because it was made with library authors in mind. Most application developers do not distribute their application as a package on Haskell or otherwise bother to change the version number. But the workflow omitted how a new version of foo was tested: was it installed with cabal install 'foo == 1.1 and how cabal knew to allow a new version that didn't match the listed constraint to be used.

Here is the workflow that is used every day to freeze deps in Ruby gems & Node npm translated to cabal:

First build of my application:

cabal install --only-dependencies --freeze-dependencies  # generates a cabal.config
git add cabal.config
git commit -m "Added cabal.config"

Another user pulls the git repo and installs the frozen dependencies:

cabal install

Need to update dependency:

  1. Change e.g. foo == 1.0.* to foo == 1.1.* in .cabal
  2. cabal install to perform a conservative upgrade which may change more than just foo
    This step can't work right now since the constraints field is designed for users to modify
  3. git status
modified: cabal.config
modified: project.cabal
  1. git diff to view all dependencies of dependencies changed in cabal.config
  2. test out new dependencies locally
  3. git commit -am "updated foo dependency"

Should be possible to implement if we agree on an UI. @dcoutts - do you have an opinion?

freezing is for application developers it has no value in an end product of a library author. There is no situation where someone would both distribute a library and a canonical frozen package list.

Agreed.

That being said, library authors will find some development use cases for a frozen package list.

Again agreed. I usually put a (not checked in) cabal.config that disables library-profiling (which I have turned on in ~/.cabal/config) while I develop. Saves my typing --disable-library-profiling when I configure. I don't expect library developers to ever check in cabal.config (or only do it under very controlled circumstances e.g. on branches).

Another user pulls the git repo and installs the frozen dependencies:

Aside: This should really use --enable-dependencies. Not using that doesn't really hurt, but also builds and installs (into the sandbox) the main package.

Change e.g. foo == 1.0.* to foo == 1.1.* in .cabal

Why do this? Your application might still work with 1.0.. .cabal is for the *must requirements (e.g. you must use foo >= 1.0 && < 1.2) and cabal.config is for the should requirements (e.g. you should use foo-1.1 when you build this time).

cabal install to perform a conservative upgrade which may change more than just foo
This step can't work right now since the constraints field is designed for users to modify

You're right that this might or might not work, depending on which dependencies foo-1.1 has. Perhaps you're asking for some help in automatically updating the dependencies of foo in cabal.config, if needed?

Looking at Gemfile.lock it does look like it does whatever cabal.config does now (i.e. list all dependencies at specific versions, including transitive dependencies.) Is there a difference I'm missing?

Looking at Gemfile.lock it does look like it does whatever cabal.config does now (i.e. list all dependencies at specific versions, including transitive dependencies.) Is there a difference I'm missing?

yes, the Gemfile. I glossed over details with Ruby. Ruby has a .gemspec that is equivalent to a .cabal file.
Gemfile.lock can be considered an automated form of adding constraints: to a cabal.config.

The Gemfile is where the application developer specifies their dependencies. A user never touches a Gemfile.lock (constraints:. Library authors do not need a Gemfile.lock, although they use it sometimes for certain purposes and in that case their Gemfile usually just contains one line that tells it to use the dependencies from the gemspec (.cabal file equivalent).

The Gemfile contains version ranges, just as a .cabal file does, only some of which peg dependencies to specific versions. The Gemfile.lock is generated from the Gemfile. It also contains all the information from the Gemfile so that it knows what the user changed in the Gemfile, which is a detail we can probably avoid at least for now.

I think we can avoid the need for an addition file like a Gemfile because we have Library & Executable sections. The Library section can continue to be the .gemspec and the Executable section can be the Gemfile.

Node's npm tells users to lock down versions by checking in libraries to source control. Node has an npm rebuild command for helping with checking in binaries, but I don't think this is a good path for Haskell, at least right now.

Does this clear things up or should I respond to other parts of your comment?

I realized there were only 2 questions

Why do this? Your application might still work with 1.0.*. .cabal is for the must requirements (e.g. you must use foo >= 1.0 && < 1.2) and cabal.config is for the should requirements (e.g. you should use foo-1.1 when you build this time).

Because I need a feature in 1.1. If I want to stay on 1.0 and get the latest 1.0 version I will look at my current frozen version (presumably in cabal.config constraints:). The version was 1.0.2. So now I bump the dep to foo > 1.0.2 && < 1.1 in the .cabal file and then do a cabal install.

This update cannot be done manually because there are dependencies of dependencies that also change. Any bump, however minor, requires the constraint solver to work out new versions. cabal is already fairly conservative once packages are installed, however we may potentially need an even more conservative update capability

So yes, I am definitely asking for help to automatically update foo.

A tool that writes out constraints was just created: https://github.com/benarmston/cabal-constraints

cabal-constraints > cabal.config

This code looks a lot nicer then mine, but a lot of my pain had to deal with trying to use the component API, and as discussed on this ticket I think pegging versions per component (and not for the Library component) will be a requirement.

Node's npm tells users to lock down versions by checking in libraries to source control. Node has an npm rebuild command for helping with checking in binaries, but I don't think this is a good path for Haskell, at least right now.

We can statically link so we don't really need to ship libraries in the way bundler do. We just ship one file! This is how e.g. Google does it.

People who want to use dynamic linking on the server have to do something more complicated (i.e. build/use a tool to distribute the .a files to the servers).

This update cannot be done manually because there are dependencies of dependencies that also change.

I don't understand this. If you bump foo from 1.0 to 1.1 in the cabal.config file (and perhaps in the .cabal file if that file doesn't already admit 1.1). There are two scenarios:

  • The set of all dependencies are still compatible (e.g. 1.1 has the same dependencies as 1.0). Everything builds.
  • The set of dependencies are not compatible, due to dependency changes between foo 1.0 and 1.1. cabal install --only-dependencies fails to run and will tell you what the conflict is. At this point you can change the constraints in cabal.config (i.e. do manual dependency resolution) to make things build.

You might say that scenario two above is inconvenient, and it might well be, but it's certainly possible to do, just as it's possible to never use cabal's dependency solver by always specifying exact versions when installing packages.

Here's the problem with writing the cabal.config file automatically every time (as opposed to only the first time it's generated). If we generate the cabal.config file anew we might also change constraints unrelated to the foo 1.0 to 1.1 change, introducing bugs. What you probably want to say is "update foo from 1.0 to 1.1 and change any needed constraints, but only the needed constraints".

Last a comment about the Ruby setup. If I understood your explanation correctly, you're saying that applications (executables) always specify exact dependencies. If this is the case that feels overly restrictive to me. There are several valid cases for the builder of an application to want to be able to pick one of several compatible versions. For example, I might want to build cabal (the executable) against containers 0.4 or 0.5.

I feel like we are on the same page now :)

And I suspect that we are in agreement that telling users to manually solve dependencies when cabal was built to do that automatically for them is not a good solution.

Here's the problem with writing the cabal.config file automatically every time (as opposed to only the first time it's generated). If we generate the cabal.config file anew we might also change constraints unrelated to the foo 1.0 to 1.1 change, introducing bugs. What you probably want to say is "update foo from 1.0 to 1.1 and change any needed constraints, but only the needed constraints".

Yes, exactly. Changing a dependency in an executable section of .cabal from 1.0 to 1.1 will usually create a conservative upgrade because the previous packages were installed (of course this brings up the issue of a fresh install). But you really need to first read in the generated constraints and use them for the most conservative upgrade possible and then write out the new constraints. I think this can be skipped for the initial alpha implementation though because cabal's existing conservative upgrade works ok, and it would be rare (perhaps foolish even) to attempt an upgrade without doing an installation of the existing dependencies.

On being overly restrictive: I think that there are 2 use cases for shipping applications.

  1. commercial users that always want a full freeze.
  2. open source projects that want some amount of flexibility in their installation. We should think through this second case. Is there a use case that is not satisfied by build flags if cabal could globally install executables properly? Why is it that you want cabal-install to be buildable against different library versions when shipped to users? That will make debugging issues much more difficult. So is it because this is a global installation that may create conflicts? I am not saying we need to solve these issues now, but we should differentiate between proper design and working with cabal's existing limitations.

Lets keep this long thread focused on the fundamentals. I opened up #1502 to discuss the UI changes needed without having to wade through this discussion.

Having read through this thread there seems to be a consensus that per-component freezing is desirable (or perhaps even needed). I can understand why a library author may not want to have any freezing in a package, but I can't see a use case for freezing the dependencies of only some components.

If I'm working in a team of developers I would want all of our dependencies to be the same. For all executables, benchmarks and tests. When building the package on a build server I would only care about the executables being frozen, but the additional freezing of unused dependencies (say, quick check) would simply be ignored by Cabal / cabal-install. Wouldn't it?

Is it that you think a user may wish to freeze all executable components in their package but not the library component? And that this would allow them to more easily test the library against a wider range of dependencies? That seems to make sense. But wouldn't an executable component depend on that library component anyway?

Maybe there's something very obvious that I'm missing. If so, would you point it out please?

I haven't given the per component freezing much thought. In principle I think we should allow it (but the current cabal.config mechanism doesn't). In practice I don't think it's very important. Either you want to freeze your deps or you don't.

When building the package on a build server I would only care about the executables being frozen, but the additional freezing of unused dependencies (say, quick check) would simply be ignored by Cabal / cabal-install. Wouldn't it?

You mean if you have

constraints:
  QuickCheck == 1.0

in cabal.config but you're not actually depending on QuickCheck anywhere in the package? Yes, cabal install will ignore the additional constraint.

So yeah, a package can contain both a Library and an Executable. So at a minimum we need a no-library component freezing ability. We can't assume that an Executable must depend on the Library, and if it does then you will effectively freeze the library for your development.

But now lets talk about non-library components. I agree with the statement that in practice you probably want to lock all or none, that is not what component freezing is about. However, there is the simple issue of allowing different frozen versions of a dependency for different components.

On top of that, the frozen dependencies must be checked in and help the developer understand what has changed. Breaking things down by component aids that, even if it creates verbosity. Ruby's Gemfile.lock actually lists out the children of dependencies in an indented tree, which can create a lot of verbosity but ends up aiding in understanding what actually changed.
gitlabhq/gitlabhq@fb4f171#Gemfile.lock

However, we can still create a first pass implementation of freezing that is not broken up by component.

I believe that this can be closed now.

Thanks!