haskell/cabal

Backpack design ticket

Closed this issue · 2 comments

This issue describes the proposed modifications to Cabal and cabal-install in order to support Backpack, a new module system for Haskell. This is not on the wiki because the wiki does not support linking to ticket numbers.

It has a companion page on the GHC Trac: https://ghc.haskell.org/trac/ghc/wiki/Backpack

Component-ize Cabal

Summary: In most places where Cabal refers to a package (the package database, package dependencies), it should actually refer to a component (the component database, component dependencies). Refactoring Cabal to distinguish between these two cases properly will let us implement #269 (internal/convenience libraries). As an important point of backwards compatibility, an internal library gets installed the package database with a munged name which identifies both the source package, as well as the name of the library.

Background: Originally, Cabal was designed to exclusively deal with packages. This assumption is pervasive: packages are stored in the installed package database, the default interface for Setup scripts builds an entire package (#2802) and is buggy when not all components are built (#2780), the configure step chooses versions for the entire package (#2725, #1575, #960), macros and the Paths module are generated once per package (#1893), preprocessor flags are computed once per package (#2971), etc.

However, Cabal also has support for components. A package can define multiple components, as in this example:

name: foo
version: 0.1

library
  build-depends: base
test-suite foo-tests
  build-depends: base, QuickCheck, foo

The classic use-case for a component is this: suppose you want to write a test-case for your library. You'd like to ship the test-case in the same package (a single unit of distribution) since it would be annoying to have to make another Cabal project just for the test-case; however, you'd like to depend on some testing libraries which you don't want the main library to depend on (multiple units of modularity). A component solves this problem by letting authors define "sub-packages" which have independent dependencies, are built separately, but can all be placed in the same package.

Up until now, Cabal has only permitted multiple components with a restriction: there can only be one library component. This is too restrictive: there are many cases where it would be useful for a package to have (at most) one public library, as well as one (or more) internal libraries. (#269) In fact, the restriction was originally imposed because it was unclear how add support for multiple public libraries (#2716); if I want to depend on the utils sub-library of the foo package, how do I specify this in a build-depends? For now, we have ruled out multiple public libraries, to avoid complicating the dependency model; however, internal libraries are (1) useful and (2) have no impact on how dependencies are specified.

Proposal: (PR #3022) We propose to extend Cabal to permit multiple library stanzas per a package. There is one distinguished library marked by just the stanza library, which represents the public library that other packages refer to when they build-depends on the package; other libraries are internal libraries which can be referenced internally by the name specified in their library libname stanza (which occupies the same namespace as package names and thus shadows entries from the public database), but are otherwise inaccessible.

Here is an example usage, using the internal library to export internal modules which should not be externally visible, but should be visible to the test-suite:

name: foo
version: 0.1.0.0

library foo-internal
  exposed-modules: Foo.Internal
  build-depends: base

library
  exposed: Foo
  build-depends: foo-internal, base

test-suite foo-test
  type: detailed-0.9
  test-module: Test
  build-depends: foo-internal, base

The library internal section defines an internal library, which both the public library and the test-suite depend upon. Components in the same package refer to the internal library by adding foo-internal to their build-depends; however, this package name is visible to external packages. (Indeed, there could even be another package named foo-internal.)

Internal libraries are also useful to define libraries that should be used by executables defined in a package, but are otherwise not accessible to other packages.

The installation of internal libraries is optimized in the following way:

  1. If a package has no public library, internal libraries are NOT registered as part of installation, and their interface files and static libraries are NOT installed.
  2. If a package builds no dynamic executable, the dynamic libraries are NOT installed. (We must install the dynamic library if there are dynamic executables, as they are relied upon at runtime.)

For example, you can use an internal library to build and install multiple statically linked executables, without affecting the package database.

Consequences on the installed package database: One major design consideration is how to handle internal libraries in the package database (#3017), when an internal library is depended upon by a public library. In this case, we must be able to install components to the installed package database (which now is more aptly named an "installed library database".) There were two major design possibilities:

  1. We change the structure of the installed package database so that it stores packages, each of which may have many libraries associated with it, or
  2. We interpret a "package" to actually be a "library", but otherwise carry on as before.

(1) behaves poorly in terms of backwards-compatibility, as old versions of GHC would not understand such a hierarchical package database. Thus, I have opted to implement (2). (Sorry SPJ!) Thus, an entry in the installed package database actually is for a library. For most entries, this library will simply be the default, public library associated with a package, in which case there will be no difference.

One important point to be made is that the name field in the package database refers not to the name of the package which defined the library, but rather both the package and the specific library in the database it is. We achieve this by munging the package name of internal libraries. Consider the previous example, the InstalledPackageInfo generated for the foo-internal library looks like this:

name: z-foo-z-foo-internal
version: 0.1.0.0
id: foo-0.1.0.0-EFUmWE8l39k5WH8q0XBAOj-foo-internal
exposed-modules: Foo.Internal
depends: base-4.9.0.0
----
name: foo
version: 0.1.0.0
id: foo-0.1.0.0-EFUmWE8l39k5WH8q0XBAOj
exposed-modules: Foo
depends:
    base-4.9.0.0
    foo-0.1.0.0-EFUmWE8l39k5WH8q0XBAOj-foo-internal

The name of the internal library is munged, so that it contains both the defining package (foo) and the name of the library (foo-library). The current encoding (#3017) is a z- (to indicate that this is an internal library), the name of the originating package, another -z-, and then the name of the internal library. (We also escape occurrences of -z- in the package name or component name.) For this encoding to work, we must assume that all package names prefixed with z- are reserved.

Relation to Backpack: Libraries are the unit of modularity in Backpack (a module cannot be instantiated, but a library can); thus, unless we are going to require Backpack users to write a new Cabal file for every reusable unit which they want to reinstantiate, the ability to define internal libraries is useful.

It will also be useful for bootstrapping, as the Cabal library can be taught how to instantiate internal libraries according to Backpack, before porting this logic to cabal-install.

Unit-ize Cabal

Traditionally, Cabal assumes that the ComponentId is an opaque identifier summarizing the source, dependencies, preprocessor flags, etc. The Cabal library makes no assumptions about the format of ComponentId; it can even be specified from an external source which has its own naming scheme. At best, the identifier is generated deterministically, so that if you compute a ComponentId which you built previously, you can tell that it's already installed and stop compiling.

In Backpack, we want to impose some more structure on this identifier, explicitly recording what dependencies we chose to build a component with. The reason we want to record this information is because, with Backpack, GHC can type-check a component for ANY choice of dependency we might make, by type-checking the "indefinite component" against the signature which specifies what functionality the dependency provides. In fact, we need to be able to represent partially instantiated components (a fully indefinite component may depend on partially instantiated components) and we need to be able to later fill in the instantiation and get the canonical ComponentId.

Concretely, this means our unique identifier should be a UnitId, which is a ComponentId plus a hole mapping from ModuleNames to Modules (UnitId + ModuleName), which describes how the dependencies are filled. A ComponentId gives you sufficient information to be able to typecheck a package, where as a fully-instantiated UnitId gives you sufficient information to compile a package (as only then are all the dependencies specified.) In the absence of Backpack, the ComponentId and UnitId are the same.

I'm working on a ghc-proposals that subsumes this.