`clc-stackage`

How to?

This is a meta-package to facilitate impact assessment for CLC proposals.

An impact assessment is due when

Proposal makes a breaking change according to PVP.
Proposal exports a new entity from Prelude or other modules, described in Haskell Report.
On discretion of CLC members.

The procedure is as follows:

Rebase changes, mandated by your proposal, atop of ghc-9.10 branch.
Compile a patched GHC, say, ~/ghc/_build/stage1/bin/ghc.
git clone https://github.com/haskell/clc-stackage, then cd clc-stackage.
Build the exe: cabal install clc-stackage --installdir=./bin.

⚠️ Warning: Use a normal downloaded GHC for this step, not your custom built one. Why? Using the custom GHC can force a build of many dependencies you'd otherwise get for free e.g. vector.
Uncomment and modify the with-compiler line in generated/cabal.project e.g.
```
with-compiler: /home/ghc/_build/stage1/bin/ghc
```
Run ./bin/clc-stackage and wait for a long time. See below for more details.
- On a recent Macbook Air it takes around 12 hours, YMMV.
- You can interrupt cabal at any time and rerun again later.
- Consider setting --jobs to retain free CPU cores for other tasks.
- Full build requires roughly 7 Gb of free disk space.
To get an idea of the current progress, we can run the following commands on the log file:
```
# prints completed / total packages in this group
$ grep -Eo 'Completed|^ -' output/logs/current-build/stdout.log | sort -r | uniq -c | awk '{print $1}'
110
182

# combine with watch
$ watch -n 10 "grep -Eo 'Completed|^ -' output/logs/current-build/stdout.log | sort -r | uniq -c | awk '{print \$1}'"
```
If any packages fail to compile:
- copy them locally using cabal unpack,
- patch to confirm with your proposal,
- link them from packages section of cabal.project,
- return to Step 6.
When everything finally builds, get back to CLC with a list of packages affected and patches required.

The clc-stackage exe

clc-stackage is an executable that will:

Download the stackage snapshot from the stackage server.
Divide the snapshot into groups (determined by --batch argument).
For each group, generate a cabal file and attempt to build it.

Querying stackage

By default, clc-stackage queries https://www.stackage.org/ for snapshot information. In situations where this is not desirable (e.g. the server is not working, or we want to test a custom snapshot), the snapshot can be overridden:

$ ./bin/clc-stackage --snapshot-path=path/to/snapshot

This snapshot should be formatted similar to the cabal.config endpoint on the stackage server (e.g. https://www.stackage.org/nightly/cabal.config). That is, package lines should be formatted <pkgs> ==<vers>:

abstract-deque ==0.3
abstract-deque-tests ==0.3
abstract-par ==0.3.3
AC-Angle ==1.0
acc ==0.2.0.3
...

The stackage config itself is valid, so trailing commas and other extraneous lines are allowed (and ignored).

Investigating failures

By default (--write-logs save-failures), the build logs are saved to the ./output/logs/ directory, with ./output/logs/current-build/ streaming the current build logs.

Group batching

The clc-stackage exe allows for splitting the entire package set into subset groups of size N with the --batch N option. Each group is then built sequentially. Not only can this be useful for situations where building the entire package set in one go is infeasible, but it also provides a "cache" functionality, that allows us to interrupt the program at any point (e.g. CTRL-C), and pick up where we left off. For example:

$ ./bin/clc-stackage --batch 100

This will split the entire downloaded package set into groups of size 100. Each time a group finishes (success or failure), stdout/err will be updated, and then the next group will start. If the group failed to build and we have --write-logs save-failures (the default), then the logs and error output will be in ./output/logs/<pkg>/, where <pkg> is the name of the first package in the group.

See ./bin/clc-stackage --help for more info.

Optimal performance

On the one hand, splitting the entire package set into --batch groups makes the output easier to understand and offers a nice workflow for interrupting/restarting the build. On the other hand, there is a question of what the best value of N is for --batch N, with respect to performance.

In general, the smaller N is, the worse the performance. There are several reasons for this:

The smaller N is, the more cabal build processes, which adds overhead.
More packages increase the chances for concurrency gains.

Thus for optimal performance, you want to take the largest group possible, with the upper limit being no --batch argument at all, as that puts all packages into the same group.

Tip

Additionally, the ./output/cache.json file can be manipulated directly. For example, if you want to try building only foo, ensure foo is the only entry in the json file's untested field.

Getting dependencies via `nix`

For Linux based systems, there's a provided flake.nix and shell.nix to get a nix shell with an approximation of the required dependencies (cabal itself, C libs) to build clc-stackage.

Note that it is not actively maintained, so it may require some tweaking to get working, and conversely, it may have some redundant dependencies.

Misc

Your custom GHC will need to be on the PATH to build the stack library e.g.
```
export PATH=/home/ghc/_build/stage1/bin/:$PATH
```
Nix users can uncomment (and modify) this line in the flake.nix.

haskell/clc-stackage

clc-stackage