elm/compiler

Elm compilation is incredibly slow on CI platforms

Closed this issue ยท 48 comments

Projects that compile in seconds on my local machine take an unreasonably long time when run on CI.

This issue produced a stopgap fix here #1473 (comment). Use that for now.

Additional Details

I've put together an example project using a sample from the elm-guide.

On my local machine this takes 2.6 seconds to build. The Travis CI build here takes 234 seconds to do the same build. My dev machine may be slightly better than the CI machines in question, but certainly not better enough for this difference in build times.

I've seen this behaviour on both travis CI & circle CI, and it only seems to get worse with larger projects. Another project of mine (a few hundred lines of elm, nothing major) struggles to build within 10 minutes.

I see there's a workaround for this here: https://8thlight.com/blog/rob-looby/2016/04/07/caching-elm-builds-on-travis-ci.html

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

@obmarg, can you figure out if it's slow to download packages or to actually build things. I'd expect it to be the former, and it'd be great to know for sure.

I think @OvermindDL1 is talking about something else, so I'm getting rid of those comments. If you have an SSCCE of your thing, and it's not fixed by elm-lang/elm-make@46ec85c then open a separate issue on an appropriate repo.

@evancz elm-package install takes 0.5 seconds, elm-make takes 234 seconds. I assume elm-make doesn't do any downloading if elm-package install has already been run?

That sounds right, but if you are doing any weird caching of elm-stuff/ I'm not 100% certain. Like if you cache exact-dependencies.json but nothing else or something. I'm not sure.

Basically, if you want the compiler to go faster under odd conditions, I need as much detailed information about what's going wrong as possible. Maybe they are throttling processes? Maybe they report they have multiple cores, but it's actually one? I have no idea without you telling me.

Another question to ask, are you running elm-package install --yes or without --yes? I think it's important to get a more precise diagnosis before "something" can be fixed.

I agree, a precise diagnosis would be a great idea. The commands that are being run are:

$ elm-package install -y
Packages configured successfully!
$ elm make Main.elm

There's nothing odd being done in between these commands as far as I'm aware.

I would like to explain what the odd conditions causing this are, but I'm not too sure myself. I've been using these CI services for years, and this is the first time I've ran into a serious performance issue like this.

Is there any way to enable more logging in the elm compiler, or anything else that would help diagnose?

It could be the case that travis & circle are reporting way more cores than are actually usable. I just checked /proc/cpuinfo in both environments, and they list 32 cores. The travis documentation specifically says you'll have 2 cores for your builds. I can't find any documentation for circle, but I'm pretty sure I don't have exclusive access to all 32 of those cores.

This line tells elm-make to look up how many cores there are, so we can use them all. Then this file will just spawn a bunch of light-weight threads and trust Haskell to schedule them nicely. I can imagine if Haskell is being told it has 32 cores, but it only has 2, that things could be getting goofy.

Is there some way to try to make sure that that line is reporting two? Or trick it into reporting two and seeing if that resolves things?

I have some logging stuff for myself, but not a public flag yet. So you could get this information if you build from source. It breaks down how much time is spent in different parts of elm-make. It may just show it's all in the compiler though, so I'd do this kind of thing as a backup because building from source can take a while and be tricky.

I've been doing a bit of work to try and confirm that this false number of CPUs is actually causing this problem. I had a look into how the getNumProcessors works, and discovered libsysconfcpus, which lets you override the number of CPUs reported by sysconf (which getNumProcessors uses under the hood).

I then built & ran that on my CI environment:

$ rm -R elm-stuff/build-artifacts/*
$ time sysconfcpus -n 1  elm-make
Success! Compiled 47 modules.

real    0m2.215s
user    0m2.195s
sys     0m0.024s
$ rm -R elm-stuff/build-artifacts/*
$ time elm-make
Success! Compiled 47 modules.

real    9m21.660s
user    15m38.880s
sys     2m47.578s

So it does look like the CPU count detection is the problem. Seems like a command line option (or similar) might be a reasonable idea?

For anyone trying to use libsysconfcpus themselves, I ran into a couple of compiler issues. My fixed version is here.

Awesome @obmarg, shared that trick with NoRedInk, I think it'll help them too!

Folks raised the idea of having a --jobs flag on elm-make, but it has problems. There are two ways to restrict the number of "jobs". One is to override this line which is a bad idea for every single user unless you run into this exact CI problem. Another is to manage a thread-pool in this file which I think is pretty pointless if Haskell thinks there are 32 cores and has its own stuff to manage. The root problem could be a bad interaction between Haskell's GC and our threads, so this route may not actually solve the problem.

I recommend folks having this problem use @obmarg's trick for now. I'd like to talk to more people who are seeing this problem in practice to figure out a solution that does not allow bad outcomes in any cases, so no need for PRs at the moment. Code is always the easy part.

I think a flag like --max-cores that conditionally overrides this line may have the right naming to make sure it is used appropriately.

So you can say elm-make --max-cores=2 anytime you want, but it is pretty clear that this is not something you want under normal circumstances. It also means you may set --max-cores=4 on a machine that actually only has two and two will win.

@obmarg, do you like that approach? Can you think of ways to make sure anyone using CI knows to use that? Maybe we should just have official CI recipes for testing?

I will talk to NRI people about this next week and get their feedback as well.

--max-cores seems like a reasonable name to me. I can think of a couple of situations where you might want to limit the number of cores you're running on, but none where you'd want to explicitly increase it.

Official CI recipes could also be useful, though there's a bunch of different ways to integrate elm into your build system. A recipe for running elm-make on CI might not help someone who uses brunch or webpack to run elm-make, for example. Though at least it could be a place to explain the issue, that people could refer to.

Don't know if this is something that you'd be interested in adding a warning to the compiler for? Though it's probably quite hard to get right...

FWIW, here is a concrete Travis recipe I arrived at that does work around this issue:

cache:
  directories:
    - sysconfcpus
install:
  - |
    if [ ! -d sysconfcpus/bin ];
    then
      git clone https://github.com/obmarg/libsysconfcpus.git; 
      cd libsysconfcpus;
      ./configure --prefix=$TRAVIS_BUILD_DIR/sysconfcpus;
      make && make install;
      cd ..;
    fi

and then wherever there is a call to elm-make or elm-test, prefix that by $TRAVIS_BUILD_DIR/sysconfcpus/bin/sysconfcpus -n 2.

As a thing to note about the --max-cores option, I think it will not suffice to just add support for this in elm-make for immediate benefit to many. People use calls to elm-test in their CI scripts and the elm-test executable then calls out to elm-make. So there would probably have to be coordination with https://github.com/rtfeldman/node-test-runner so that people get to pass an option like --max-cores to elm-test, which will then know to pass it on to elm-make.

This workaround cuts my elm-package + elm-make time in Travis CI from almost 10 minutes down to 5 seconds. Nice work. Thank you.

Would an environment variable make sense for this? ELM_MAKE_MAX_CORES=2 (modulo bikeshedding the name) would be available to the compiler regardless of wrapper scripts or tools, and every CI provider has first-class support for setting those vars.

This looks like a promising work-around. My team is presently building our elm modules in a Docker container. I will try this out and report back. If anyone has already done this (with Docker) please respond with your results and possibly save us some time. ๐Ÿ˜„

Btw: sysconfcpus -n 1 also worked very well to speed my builds on CircleCI from 24 minutes to just 9 seconds!

For those using npm install -g elm to obtain elm-make, I expanded on @jvoigtlaender's amazing workaround to replace elm-make with a script that prepends $TRAVIS_BUILD_DIR/sysconfcpus/bin/sysconfcpus -n 2 https://github.com/rtfeldman/node-elm-compiler/blob/master/.travis.yml#L37-L39

Basically this is a drop-in replacement that makes elm-make "just work" for the tests themselves. ๐Ÿ˜ธ

mgold commented

@rtfeldman Is this something that can help elm-test CI builds as well? Can you write a ready-to-use .travel.yml for that?

@evancz AFAIK, the line from the comment #1473 (comment) could be removed, since the default value of GHC.Conc.numCapabilities should be the number of processors, or can be controlled via the runtime options +RTS -N[x] -RTS

@francesco-bracchi, I think you are wrong. Simply leaving that line out will change how the compiler behaves. Namely, it will not use concurrency anymore then. See https://downloads.haskell.org/~ghc/master/users-guide/using-concurrent.html.

The fundamental problem is that if the OS reports that there are N CPU cores, you cannot assume that (a) all N cores can be used and (b) that all N cores are completely idle when the program starts nor when the program is executing.

You don't need something as exotic as a cloud build platform that restricts the number of CPU cores given to a process while nevertheless reporting the actual number of CPU cores. All you need to do is run 2 instances of elm-make at the same time. In normal development where you are running elm-make from the CLI, this will never happen. But if your CI server runs multiple build jobs, this can happen, and it results in builds that mysteriously take a much longer time to complete than normal (or even hang long enough that you need to cancel it).

Each build job invokes elm-make which will think that it has N cores that it can max out. And so elm-make will blithely spin-up N sub-processes to fully utilize the CPUs. Now you have 2N CPU-intensive processes fighting over N CPU cores. As N gets large (in my testing, once you go from 8 to 16 cores), the performance degradation gets very bad.

This problem (parallel invocations of elm-make during the build) can also happen when using elm-webpack-loader. For each place where your JS program imports a .elm file, elm-make will be called asynchronously, resulting in the same behavior described above.

In my project at work, we have a server written in Kotlin, 1 pure Elm app, and 1 JS/Elm app. We use Gradle to orchestrate the entire build. Like many build systems, Gradle has the ability to perform parts of the build in parallel. Since there are no inter-dependencies between the 2 web-apps, Gradle thinks that it can build them in parallel. This results in 2 elm-make processes fighting over the CPU.

I mention all of this because @evancz wanted more examples of real-world build problems. In my view, the parallelism implemented in elm-make today does not compose well with the larger system, which itself might be a parallel bundler (elm-webpack-loader), a parallel build system (Gradle), or a build server (Jenkins).

I'm no expert on parallelism, but it seems that the right fix is for elm-make to dynamically adapt how many threads/processes it uses based on current system load. In the meantime, I think it makes sense for elm-make to provide the already-discussed --max-cores option to allow the caller to leave the appropriate amount of headroom needed by their build environment.

My other recommendation is for elm-webpack-loader to restrict the max number of parallel elm-make jobs to 1 since elm-make makes assumptions that all CPU cores are idle. (CC @rtfeldman)

@klazuka I am the maintainer of elm-webpack-loader. There is a case for setting the default max instances to 1, but not always. Having parallel builds for things that can be parallelized decreases build time -- I have tested and confirmed this for the largest elm code bases in the world. Please open an issue on elm-webpack-loader if you want to discuss it further.

Just hit this bug, here's what I have to add:

I have an 8-core build server for elm, which works as usual when I use all 8 cpus.
However, on any configuration which does not use the same number of CPUs as the system advertises, the build gets painfully slow (as reported).

On each container the system always reports 8 CPUs, even if only a subset of them are available.

The interesting bit is: it doesn't matter if I pin my jails (containers) to use 1 or 7 of the 8 cpus, it gets infinitely slower on both cases. It works fine only when all 8 cores are used.

Unfortunately I can't get sysconfcpus to run properly on FreeBSD, but it would probably solve the issue. +1 on the --max-cores flag/env var.

Would be grateful for some advice here, as I've never use CI before. My (mis-)understanding of @rtfeldman 's comments above was to add these two scripts to package.json.

 "scripts": {
    "test": "$TRAVIS_BUILD_DIR/sysconfcpus/bin/sysconfcpus -n 2 node_modules/.bin/elm-make && node_modules/.bin/test",
    "postinstall": "elm-package install -y && cd tests && elm-package install -y"
  },

That's loading everything for my tests, but I get an error

sh: 1: /sysconfcpus/bin/sysconfcpus: not found

What am I missing?

@simonh1000 you need to install sysconfcpus first. See #1473 (comment)

When using CircleCI @jvoigtlaender code has to be packaged correctly

circle.yml

dependencies:
  cache_directories:
      - sysconfcpus
  pre:
    - ./sysc.sh
  override:
    - npm install

with the script put in sysc.sh

Still trying to figure out a better caching strategy to speeds things up completely

Just wanted to add that if you run in non-containerized builds on Travis (sudo: required), it also runs fast. I wasn't actually able to get it to work inside the containerized build.

dam5s commented

Just wanted to chime in, as I'm having the same issue but not necessarily on CI.

We have an application written in Kotlin and Elm that we build with Gradle. I wrote a small Elm plugin for Gradle that has tasks for elm-install and elm-make and will run them in correct order.

When we parallelize the Gradle build (so it builds Kotlin components and Elm apps in parallel), Gradle will only allocate 1 core to the elm-make task if it runs in parallel with other tasks, until no other task can run because it's waiting on elm-make to complete.

This results in slowing our build dramatically, while if we could tell elm-make to only use one core, it would be able to build at the same time as our other components.

AdamT commented

Another data point. I'm using the Rails Webpacker gem and think I am seeing the same issues on CircleCI and CodeShip. When I run ENV=test bin/webpacker while setting up my env, the process times out here:

Running /home/rof/src/github.com/me/my_app/node_modules/.bin/elm-make /home/rof/src/github.com/me/my_app/app/javascript/Hello.elm --yes --warn --debug --output /tmp/117910-6416-1qcmv7p.keiegeqaor.js

Running bin/webpacker locally in development has no issues.

Update
Implemented @rtfeldman solution: https://github.com/rtfeldman/node-elm-compiler/blob/master/.travis.yml#L37-L39
Running a single request spec takes 50 seconds but passes. Then when running an integration test I end up canceling after 5+ minutes since it appears to be stalled (or compiling).

Any ideas on what to do next? (There's no output so I assume compiling)

Update 2
I fixed my issues and created a PR to Webacker to help document the issue so that others have an easier time than I did. Feel free to comment if I've neglected to include something.

๐Ÿ˜„

I see this thread is still active and think there is still some misconception. Straight to the point.

The fundamental problem is that if the OS reports that there are N CPU cores, you cannot assume that (a) all N cores can be used and (b) that all N cores are completely idle when the program starts nor when the program is executing.

isn't true in my opinion.

I'm running Arch Linux on my AMD Ryzen 7 1800x workstation. That is 8 physical cores (tested with 3.6GHz stock base speed and 4GHz overclocked) with hyperthreading - 16 threads total. My core utilization before I run elm-make is usually around (less than) 1% (I don't even have a full desktop, just Xorg and Xmonad).

Elm make is able to utilize all 16 threads to 90 - 100%. However, more threads involved longer overall compile time is.

These are numbers with stock 3.6GHz. I'm using webpack.

16 threads: yarn build  4342.29s user 502.08s system 1459% cpu 5:32.01 total
4 threads: sysconfcpus -n 4 yarn build  91.79s user 51.30s system 339% cpu 42.147 total
2 threads: sysconfcpus -n 2 yarn build  73.28s user 15.51s system 219% cpu 40.517 total
1 thread: sysconfcpus -n 1 yarn build  61.04s user 1.14s system 154% cpu 40.187 total

As you can see time difference is non-linear. It just happened that most of you are running dual or quad cores. From my experience, it seems more likely to be an issue with compiler itself. It performs really badly when you start adding threads even on a machine with many unutilized cores.

Thanks for this fix. Now my CI run in 1 minute instead of 10.

I also have the same observation as @turboMaCk. On my personal computer with 8 hyper-threaded cores on 4 real cores (i7-3770K), I get a significant speedup by using sysconfcpus -n 1. From 4 minutes to 0.5 minutes. This is with no significant load apart from elm-make.

Yeah, I think this is an anti-pattern for most Haskell applications due to the GC etc. not behaving optimal with many cores:

https://github.com/elm-lang/elm-make/blob/1a554833a70694ab142b9179bfac996143f68d9e/src/Main.hs#L23

I've had a different Haskell application where I got more than 100x speedup by limiting to two threads. I think there are some flags the newer GHC versions that allow you to have many threads but fewer GC threads.

I've run into this same class of problems when writing Clojure + Java 8, running on CircleCI. The machine had oodles of RAM, and my JVM thought it could take more of it than it was allowed to use. Manually setting memory limits fixed the issue. The root cause from my perspective is that the system (JVM/elm-make) is not correctly interpreting the hints that the environment is giving it about what resources are available to it.

Java 9 and 10 have improvements to running under Docker containers. In Java 10, the container can look at its runtime to see what constraints it is running under.

In theory, it seems like it would be possible for the Elm compiler + associated machinery to take a similar approach. Automatically detecting the number of CPU cores available would fix this without requiring any configuration from users. Something like nproc looks like one approach you could take for detecting the number of allowed CPUs to use.

Apologies if I've misunderstood the issue, I didn't really see anyone directly suggesting that elm should detect how many CPU cores it is actually allowed to use.

We've looked into that approach. As it turns out, Haskell's concurrency library only knows how to detect "number of physical cores," not "number of available cores." Node.js is the same way.

Rust's num_cpus crate knows how to detect both. It's possible we could introduce some Rust FFI to Elm's compiler (which is written in Haskell) just to accurately get that one number, but it's not clear that's the best path. ๐Ÿ˜„

May I ask if this issue is in any way impacted by Elm 0.19?

I'm not sure. Anyway, in my opinion, people still blame the wrong thing. Problem is not necessary detection of CPU cores. The issue is that no matter what, compilation gets slower with increasing number of threads. The environment in which more threads are used only makes symptoms more noticeable but CPU detection isn't an issue by itself. Maybe it's a secondary issue which makes sense to fix once primary issue - the fact that compiler gets slower with increasing number of threads even though HW resources are available.

Yeah - they are separate issues; fixing one but not the other would not solve the problem completely.

My personal opinion is that:

  • we have a workaround that is already commonly known... it's not nice, but it's not that bad (stressing the word known)
  • this is not the simple problem to solve (or both aren't simple) we need to be patient
  • fixing it partially might be worse than ignoring that. It would cause different odd behavior which unlike this wouldn't be known to folks and would make more confusion, more threads....
  • even if this still exists in the same form in 0.19 (which I don't know yet but other might do) single thread is fast enough. So it's "just inconviniece" of setting this up in CI or on a desktop if you have 8+ cores machine.

As a affected user I'm happy I don't have to find a new workaround every month after some bad patch for this is released.

On elm 0.19 a test suite of 5 tests that runs in less than 2 seconds with sysconfcpus -n 1 on a Concourse set up, has been running for 3 hours 55 minutes on the same Concourse set up without.

If more specific information would somehow help address this issue, I would be happy to provide it.

@davcamer This i san elm-test on Linux issue, unrelated to the compiler!

See rtfeldman/node-test-runner#295

With 0.19 folks are able to say things like:

elm make src/Main.elm --optimize +RTS -N4

The things after +RTS are flags to the Haskell runtime, so you can tell it how many cores to use, tweak GC options, etc. There is also a script for TravisCI these days that should account for the root issue.

@rtfeldman also documented the root problem in GHC that led to this here.

Given that there are workarounds in Elm, and the root issue is in GHC, I think it makes sense to close this issue. If folks are still having problems, please open a new issue explaining your particular scenario, with an SSCCE if possible!

Noting that elm-test-rs solved this issue (elm-test slowness on CircleCI) for me, without reverting to sysconfcpus.

I believe this was fixed in Elm 0.19.1.

It has been fixed in elm 0.19.1!

(As cool a project as elm-test-rs is it doesn't do anything special with respect to invoking the elm compiler: if you don't need sysconfcpus for elm-test-rs, you don't need it for elm-test either).

I do apologise, and thank you for correcting me @turboMaCk and @harrysarson. I jumped to the wrong conclusion, and can confirm that elm compilation is not the issue, and neither does sysconfcpus make a speck of difference.

[I don't know if it's to be expected that elm-test is ~20x slower than elm-test-rs on CircleCi, but this is not the place for that question to be addressed!]

[I don't know if it's to be expected that elm-test is ~20x slower than elm-test-rs on CircleCi, but this is not the place for that question to be addressed!]

I would be interested to here more about this! If you have the time, ping my @harrysarson on the elm-tesr slack or open an issue at https://github.com/rtfeldman/node-test-runner.