nim-lang/nimble

Locking of dependencies

dom96 opened this issue ยท 40 comments

dom96 commented

When you build your package a nimble.lock file should be created.

This nimble.lock file should contain some metadata about the package as well as the dependencies which your package has been built with.

Araq commented

Sounds like you finally want to use db_sqlite in Nimble. :P

dom96 commented

How does db_sqlite help in any way?

I might be interested in working on this functionality. Here are a couple of thoughts in no particular order:

  • I think it is valuable to have a human-readable textual lock file, so that meaningful diffs can be see in version control.
  • For ease of implementation, I imagine the lock file would essentially be little more than a pretty-printed serialisation of the fully-resolved dependency graph.
  • We should seek to avoid the failures of npm shrinkwrap. Ruby's bundler works fairly well, there is little reason we shouldn't seek to emulate it.
  • A lock file should be created on nimble build invocation. If the lock file exists, nimble build should install the packages listed therein.

Do all these seem reasonable or am I wildly off the mark of how you would like this project to work?

dom96 commented

I must admit I haven't done much research into how other package managers do it yet, so I can't say much about how I think the implementation should be done.

In regards to your last point, I do think that it might be a good idea to have an explicit nimble lock command to perform the creation of the lock file.

dom96 commented

But in general I think you're on the right track and I agree with your points.

Regarding general research / information about other languages' package management, here's a great read:

So you want to write a package manager.

@aidansteele, lockfile would be really great!
May i add another point to what i'd like to see along with locks:

  • There should be some way to override package search path to local path. E.g. if I compile my project that uses library MyLib, don't checkout MyLib from anywhere, but use it from the local path i specify. I think cargo package manager does it pretty good.

Also as dom96 said, lockfile creation should not be bound to nimble build command. E.g. I don't use nimble build, because I find nake more functional for this task.

Yes, being able to override the path would be extremely useful. Ruby's bundler allows you to specify just the name (in which case it looks up in the central repository), a git repo (in which case it uses master), a git branch or specific SHA, or a local path. It would be good to support all of these options.

@yglukhov Do you have an example of how you use nake? I'm still becoming familiar with Nim and would like to see what does / doesn't work well with the standard tools.

@aidansteele, here's how I use nake currently. nakefile example: https://github.com/yglukhov/nimx/blob/master/nakefile.nim

It imports naketools:
https://github.com/yglukhov/nimx/blob/master/nimx/naketools.nim

Naketools basically defines a builder that knows how to build targets for different platforms, like current, js, android, ios, emscripten, etc. Creates bundles for ios and macos, codesigns, converts/packs resources, compiles SDL for selected target if necessary, and even launches a jester server to run JS or emscripten version. Also it's pretty configurable through command line, eg.

nake # build and run for current platform
nake tests # build and run tests for current platform
nake tests -d:js # build and run tests for javascript target
nake js -d:release --norun # build javascript target in release, but don't launch it in jester
nake droid # build android target and run it on currently connected device
# etc
dom96 commented

Some short insight into how Cargo handles this here: https://www.reddit.com/r/programming/comments/7q6ida/jai_libraries_discussion/dsnu9zc/

Another note: Cargo supports building software that depends on multiple versions of the same package: https://www.reddit.com/r/programming/comments/7q6ida/jai_libraries_discussion/dsnu6ls/

zah commented

Our team is willing to impelement the following spec for Lock files if the Nimble project agrees to adopt it.

Lock file contents

The lock file uses the TOML format and includes an array of records with the following fields:

  • name: name of the package (required)
  • uri: URI from where the package can be obtained (required).
    Initially, only git:// and git+https:// URI will be supported. In such URLs, the hash tag can be used to indicate a particular git revision.
  • checksum: Checksum of the downloaded package contents (required).
    The checksum field is a TOML table with a single key, which indicates the checksum type. sha256 will be supported as the initial checksum type.
  • version: A human-readable version string (optional).
    This is provided only as useful metadata for the developers examining the lock file.

In other words, here is an example lock file:

[[package]]
name = "quicktest"
uri = "git+https://github.com/alehander42/nim-quicktest#d621dcdcb8d2011882d6b1a7a4f4d1d9a764035c"
checksum = { sha256 = "bdf11d78dbd66fe107e4c35d75e7a9412055285bd9985f66f30711002ab24dca" }

[[package]]
name = "chronicles"
uri = "git+https://github.com/status-im/nim-chronicles#a7f5589ed81524fe427ed4d5134eabb5b7536996", 
checksum = { sha256 = "5ecebcc84898a056d44f2592a9db3ea79773090a5f168de1adde0583118b0e2b" }
version = "0.4.0"

The lock file will include all transitive dependencies of the project.

New low-level operations

proc updateLockFile(strictMode = false)

This examimes the state of all dependencies that are in "develop" mode and writes their current state to the lock file (a dependency might have gained new commits for example or a previously indicated revision may be now marked with a git tag). When strictMode is enabled, this procedure should produce an error when some of the dependencies has unclean state.

There are several possible policies for deciding when to run this proc. It's expected to be relatively fast, but still Nimble should try to avoid introducing any delays in the compile/test cycle. Here are some suggested options:

  • Run it after a successful nimble build as a background job.

    The lock files are expected to be stored in version history and they are a mechanism for achieving fully reproducible builds within a team. Thus, the lock file should always reflect how a particular developer built and tested a newly added commit. If the lock file is updated on each build, our goal will be achieved quite naturally without much effort from the developer (the only possible mistake would be forgetting to add your updated lock file in the commit).

    Please note that updating the lock file after the build has the added benefit that it doesn't slow the build in any way.

  • Run it after a successful nimble test.

    This is quite similar to the above, but it delays the modification of the lock file until the developer has reached a successful execution of the package tests.

  • Run it as a part of nimble check or a newly added stand-alone command.

    This is a more explicit step that reduces the amount of lock file updates during everyday development. If a developer ot a team wants to ensure that the lock file is always kept up-to-date, running nimble check from a git pre-commit hook would be one way to achieve this.

Personally, I would suggest option 1, but I hope to hear some opinions and feedback from the community.

The procedure itself is quite straight-forward. A dependency is considered to be in develop mode if the global nimble store doesn't include a copy of the designated package with a version matching the contents of the lock file, but one of the following conditions is true:

  1. The global store includes a #master check out of the designated package linked to a local directory.
  2. The project uses a path override to indicate that a local directory should be used as a source for Nimble packages and the designated package is present there.

Nimble will then compare the state of the local directory to the contents of the lock file and the later will be updated to match the reality. Various corner cases should be handled gracefully, such as moving back in the revision history, switching to a different branch, switching from a revision hash to a named tag and so on.

A more complete implementation will not update the lock file if one of the following conditions is true:

  1. Some of the locally developed dependencies have unclean state ("unclean" is defined as having files in "modified" state in version control).
  2. Local commits are present that are not pushed to a git remote.

When strictMode is enabled, these conditions should result in an error. Otherwise, diagnostic warnings will be printed. nimble check will use the strict mode and when combined with a pre-commit hook, this will ensure that all created commits have fully specified dependencies for a reproducible build.

Please note that the way these checks are performed depends on the version control system being used and the specific choices regarding the repository topology, so a project-specific configuration will be required.

proc syncWithLockFile()

This proc is expected to run upon a detected modification of the lock file.
It performs the following high-level procedure:

for dep in lockFile:
  if not packageExistInStore(dep):
    obtain(dep)
  else:
    if isInDevMode(dep):
      # Offer the user to synchronize the state of her local check out
      # Handle problems such as uncommited file, non fast-forward merges
      # and so on. The initial version may treat many of these conditions
      # as errors that require the user to resolve the problem manually.
      interactiveUpdate(dep)
    elif not versionExistsInStore(dep):
      # This should perform a side-by-side installation with any previous
      # version that already exists in the store. Ref counting may be
      # employed to clean up no longer needed versions.
      obtain(dep)

# If the above loop was not interrupted by errors, this will use some
# caching mechanism to avoid re-running the proc with the same lockfile
# in the future. One possible implementation is creating a file in the
# nimble store with a name derived from runnin `SHA256` on the lock file
# contents. Another way would be to store the last synchronized lock file
# in a specific location in the .git folder of the repo.
markAsSynchronized(lockFile)

syncWithLockFile is expected to run as a first step in nimble build and nimble test and it needs to be able to complete as quickly as possible after the required packages have been already obtained.

proc findPackageUpgrades()

This scans the repositories of all dependecies listed in the lock file in sarch for newly released versions. It tries to check whether it's possible to upgrade some of the packages while still satisfying all constraints listed in their nimble metadata. The user is presented with an upgrade plan. If the plan is accepted, Nimble will obtain the needed packages and then it will update the lock file. This proc is considered out of scope for now.

zah commented

While implementing this, we'll pay special attention to the following issues which have been a major hurdle for our team:

#543 (the suggested solution is described here), #589 and #318

Araq commented

I implemented "lock files" 2 years ago for nawabs, here is the code https://github.com/Araq/nawabs/blob/master/recipes.nim#L52

I doubt it will be of much help since nawabs' internals are quite different from Nimble's, but I too generate Nim(script) code so it might provide some inspiration.

dom96 commented

Sounds mostly good, although I think your plan can be simplified.

The format of the lock file will consist of standard Nim syntax, imitating calls the the following function:

You're implying that you want the lock files to be a valid nimscript file which IMO is a mistake. We want lock files to be parsed quickly and booting up the Nim VM is very slow, I also don't see any reason why the lock files shouldn't just be a flat data store.

As such I think we should settle on a common format like ini, toml (what cargo uses), json or similar. If you really want this semi-nim syntax then that's okay with me, but we need to specify it accurately before implementation. In fact, I would like to see a spec even if we use json or similar.

A dependency is considered to be in develop mode if the nimble store doesn't include an explicitly versioned copy of the designated package (with a version matching the contents of the lock file), but it does include a #master check out linked to a local directory.

In Nimble's world there already is a definition of what "develop mode" implies: a package which was linked into ~/.nimble/pkgs via a packageName.nimble-link file. Your definitions sort of refers to this but adds some pre-condition that I don't understand, so maybe we can just settle on my definition? :)

Here are some suggested options:

I think creating a lock file whenever any compilation is performed is a good solution. This means nimble build and nimble c, the test command will typically execute nimble c or nimble build anyway so that would be covered implicitly.

Although, I just thought of something we need to consider: tests should probably have a separate lock file since there often is a need for different dependencies when testing. But we can worry about this later I think.

proc syncWithLockFile()

I think that our implementation of this could be much simpler for a first stab at this. We don't need to have this fancy detection of whether the lock file was updated or whether the .nimble file dependencies were. Just merge the deps in the lock file with the deps in the .nimble file every time and we can think about optimising it later, this shouldn't be too much of a bottleneck anyway since we're only doing this for the top-level package (as none of the transitive dependency's lock files are taken into account)

proc dep(name: string, url: string, ver = "", rev = "")

a procedural approach to lock files seems strange

  • what semantics would you have if you call the function twice for the same dep? etc..
  • slow to read and hard to build tooling around - if it's a custom format which is less than nim but offers no semantic advantages over a toml, it means custom parsing has to be developed, a mostly wasted effort for 3rd parties

ver = ""

this looks like an anti-feature. versions are specified in the nimble file and follow a social contract - the purpose of a lock file is to take that socially agreed information and create a reproducible, secure and deterministic build out of it, where no outside action can affect it at all (without giving the developer an opportunity to explicitly review what has changed). You create a lock file explicitly when you don't want things to change.

Neither a version nor a package name has any reasonable security properties - if time and code is spent on supporting this feature, it's only a matter of time before nimble will have its nodejs/npm moment and a high-profile package gets hijacked trivially like this (in the nodejs world, it sparked a separate pm to be developed.. everyone else, including git realized from the start that this is a bad idea all over)

rev = ""

a secure hash solves all the problem above, and to keep it flexible one should make sure it's specified in a future-compatible way with other schemes than git (so we could have a number of supported and secure ways.. git sha1 for git repos, shaX-of-tarball for getting code from a tarball instead of git, some other hash for mercurial repos etc).

Using versions is a really deprecated way of doing things, akin to sending plaint-text passwords over internet. Consider here that lock files typically get committed to git and stick around for years - I want to be able to securely reproduce my build 1, 2, 5 years after the fact as well, and this is not a problem you can fix retroactively after a breach - as a tool developer, you have to think about this for your users before they run into this issue.

As far as security goes, this model is usually referred to as TOFU - it's not the most secure one out there, but you'll find it in places that value convenience over security (ssh, whatsapp etc).

There are several possible policies for deciding when to run this proc.

If it's automated, it's no longer really useful as a lock file, for the above reasons - it should be called something else at that point to manage expectations and not provide a false sense of security - a dependency resolution cache or something similarly ephemeral, so it's clear it's just an optimization and ultimately garbage that can be safely removed at the tiny cost of a minimally increased build time.

That said, I'd envision that an explicit create/update-lockfile command would make more sense in line with the TOFU model above - that's what I actually want as a developer, to make sure the rug is not pulled from under my feet, but also to have the convenience of a tool interpreting the social information that version numbers give (ideally slightly structured, like semver). I build much more often than I want my dependencies to be updated (unless I've chosen for a particular, tightly coupled project, to follow deps closely - but then it's also likely I want to use a monorepo, and relative-path dependencies without further versioning)

syncWithLockFile()

something like this could be useful regardless of lock files. I see 3 components here - a dependency resolver (for solving version constraints), an (optional?) lock file (for solving security, determinism, all that) and a cache for doing diffs since the last build. that said, it's also the least worry here, it seems.. if you already resolved version numbers to a hash, and have something that says that the code matches it (like git, or a directory name with the hash in it, for a tarball), it's usually really fast anyway for a dozen deps, so instead of having a global syncwithdeps it seems more agile to delegate this to whichever part of the code that does the downloading etc, so it can be done flexibly for different code sources ("this dep comes from a tarball in my repo"), for example depending on the url format of url

nimble develop

with a lock file and path overrides, it looks like this feature can be deprecated / removed completely - it's a really odd feature that's tied to the current approach where global state affects local builds causing lots of problems, meaning that you can't run two projects or branches with different settings side-by-side.

zah commented

OK, I've incorporated feedback from here and other sources into the proposal above. The format of the lock file has been changed to TOML.

zah commented

There has been some heated debate regarding what is the most appropriate time to produce the lock file. I think this decision can be safely delayed until the low-level operations described above are fully implemented. I've now added few more notes clarifying what conditions will prevent updating the lock file and which packages are considered to be in "develop mode". Please note that the behavior of the updateLockFile proc will be the same regardless of which policy for running it is chosen (automatic or manually triggered).

dom96 commented

I build much more often than I want my dependencies to be updated (unless I've chosen for a particular, tightly coupled project, to follow deps closely

Very good point which I totally missed. Totally agree.

There has been some heated debate regarding what is the most appropriate time to produce the lock file. I think this decision can be safely delayed until the low-level operations described above are fully implemented.

๐Ÿ‘

with a lock file and path overrides, it looks like this feature can be deprecated / removed completely - it's a really odd feature that's tied to the current approach where global state affects local builds causing lots of problems, meaning that you can't run two projects or branches with different settings side-by-side.

I think we'll need to discuss this separately, or perhaps we should discuss this here since we need to figure out how to handle local file paths in lock files. Any thoughts?

The format of the lock file has been changed to TOML.

I know I've mentioned it in my list but if we are going to use something like that then IMO a better option is just .ini. The primary reason is that we've got a parser for it in the stdlib already.

Araq commented

I would pick JSON as lockfiles are not for direct editing by humans and JSON has the most tooling available. Nim's .ini parser is non-standard (well back then there was no standard).

dom96 commented

I'm okay with JSON too.

For reference, npm's lock files are also JSON. We could basically copy their format: https://docs.npmjs.com/files/package-lock.json

npm is mainly used to install yarn, these days ;) https://yarnpkg.com/blog/2016/11/24/lockfiles-for-all/ is a good ref on why it exists, in the first place, and what problems lock files solve.

dom96 commented

https://yarnpkg.com/blog/2016/11/24/lockfiles-for-all/ is a good ref on why it exists, in the first place, and what problems lock files solve.

Does yarn treat lock files significantly differently from npm?

that has changed over time as both projects evolved.. see https://yarnpkg.com/blog/2017/05/31/determinism/ for example.

https://www.sitepoint.com/yarn-vs-npm/ spells out some other differences, including defaults around globals etc

yarn are making more changes for v2: yarnpkg/yarn#6953 so some of the previous info might be outdated

Tracking build dependencies has important security benefits: it allows to detect if a binary has been built with a library with known vulnerabilities.
It would be very good for practical reasons to embed the lock/manifest metadata in the binary in a way that can be detected by a simple scanner.
It should also contain the version of the Nim compiler.
The feature should be disabled by --opt:size

Is there any progress on this lock file implementation? Looks like @bobeff was working on it for some time but there's no update in 2 months.

Hi. Sorry for the big delay, but it happens that some personal engagements appeared and I don't have enough time to continue on this for now. I have desire to finish it, but It is possible that I won't be able to continue it until February next year. Do not fill obliged to wait for me, if someone else wants to do it.

@bobeff I'm just curious, how close do you think you might be to a dependency locking PR (time- or percentage-wise)? I can see you've been putting a lot of work into it recently

This is related to #424

@ndon55555

Deep sorry to everyone for the big delay,

I don't work full time on the task and I was plagued by different unexpected engagements and other personal issues. In addition, the task was extended to include a big change of how the packages develop mode works as a Status requirement. Tarball and parallel downloads of packages are also required, but I can do them as separate pull requests, after the lock files, because they are not so closely related with the lock file functionality.

Currently, a friend of mine asked me for help with another side project which has to be done to the end of August. I hope to be able to commit the changes for the new packages develop mode with detailed tests for it to the end of this week before starting to work on the unexpected project. After this what left is:

  • to update locked develop dependencies (from the new develop mode) according to the content of the lock file.
  • to write tests for the lock file itself.
  • to update the documentation.
  • to decide how to integrate Nim with Nimble in such a way that to be aware of the locked and develop dependencies, and required changes in Nim to be added.

I think that this could cost another month after August which will be busy with another task and the pull request can be expected at the beginning of October.

Best Regards,
Ivan Bobev

FRidh commented

Looking forward to this feature! When its there I'll put in some effort to make the lock files usable with the Nix package manager.
(There is https://github.com/nix-community/flake-nimble but it would be a lot easier with a lock file.)

One comment on the implementation. I suggest making the hashing algorithm variable, because in time certain hash types become unsafe, like sha1 is already. In Nix(pkgs) we're moving towards using SRI notation for hashes.

Edit:
Just want to note that the hash type of fetchers is variable. For store paths (the output of some build/fetch action) we do use one type of hash, sha1, as is done here. But this hash is computed over the graph of the inputs, although work is now underway for content-addressable paths as well.

FRidh commented

There is a tool called nimph that does locking, although not yet sufficiently.

with a lock file and path overrides, it looks like this feature can be deprecated / removed completely - it's a really odd feature that's tied to the current approach where global state affects local builds causing lots of problems, meaning that you can't run two projects or branches with different settings side-by-side.

I don't think this should be removed because there are lock files. Keeping the option of using nimble develop along with a lock file would be great since that helps a lot when using cross-project dependencies where you have tightly coupled projects that test with eachother.
I don't know if any other CI system does this but that's one of the core features of zuul which makes nimble play really nicely in that system.
If the develop feature is removed there would need to be a non-official way of installing not-yet-released versions of packages and that would always be more prone to breaking or causing other problems.

tightly coupled projects

modern / principled tools solve this by allowing per-project-folder overrides - nimble develop changes it for all projects which simply doesn't scale.

zah commented

With the new nimble develop functionality already implemented in @bobeff's branch, you can have local path overrides that are specific to the current project folder. It can also affect a group of projects (e.g. all repos of a team).

modern / principled tools solve this by allowing per-project-folder overrides - nimble develop changes it for all projects which simply doesn't scale.

Agreed. It being global and not on a per project basis can be a problem.
In my case the tests use throwaway pods in kubernetes and configures nimble develop during runtime.

With the new nimble develop functionality already implemented in @bobeff's branch, you can have local path overrides that are specific to the current project folder. It can also affect a group of projects (e.g. all repos of a team).

Great!

zah commented

Lock files have been delivered several months ago with #913.
The first Nimble version that supports them is 1.4, although there have been various fixes since then.
The README of Nimble documents their usage.

EDIT: Above, I meant Nimble 0.14 instead of 1.4, as pointed out in the comments below.

dom96 commented

Just to be clear: Nimble v0.14 not v1.4

A dumb question, where can I find Nimble v0.14?
GitHub releases shows 0.12.0 as the latest version.
The last tag I see is v0.13.1.

dom96 commented

it's as yet unreleased, you'd have to grab HEAD. Might be time to release it, but it is quite a major release so it would be nice if some of the community tested it before it's officially released.