tweag/ormolu

Format multiple files in parallel

Closed this issue Β· 9 comments

Is your feature request related to a problem? Please describe.
At work we use treefmt, which calls a formatter with all the matching files in the repository, i.e. we end up calling ormolu <list of all our haskell files>.

We have ~2.5k Haskell files, and this takes about 32 seconds.

Describe the solution you'd like

One obvious solution would be to just run the files in parallel. That could be as simple as replacing the mapM here with mapConcurrently.

Describe alternatives you've considered

  • Make ormolu faster some other way.
  • Get treefmt to call ormolu once per file, in parallel. This would also be perfectly sensible.

I made numtide/treefmt#333 on treefmt, let's see what they say also.

Sounds sensible!

There was this attempt in the past: #896

Get treefmt to call ormolu once per file, in parallel. This would also be perfectly sensible.

Just to get an idea of how much this would speed things up in your case, you could try running with fd:

fd -e hs . /path/to/my/dir -x ormolu -m check

Great idea. I did that, and it didn't actually help that much! Still takes about 25s. I'm unsure why this is.

Hmm πŸ€” Might be interesting to run a profiled version of Ormolu to see if there are any pathologies (such as the one that prompted #896), but I assume your code isn't open source?

Sadly no, but I can probably build a profiled ormolu pretty easily and give it a try myself. Seems like I'm not the first person to hit this πŸ˜‚

Here's a flamegraph of running a profiled ormolu (well, it's actually fourmolu) on all the files. Sure looks like it's bound by the parser, which I would expect to be parallelizable πŸ€”

fourmolu

A data point: I used the new treefmt which runs formatters on batches of files, currently batches of 1024. So it formats our code in two batches. That is almost 2x faster, which suggests that there is benefit to be had here.

Okay, I got a 3x improvement for just what I suggested (replacing mapM with mapConcurrently). I think maybe #896 didn't work because the executable wasn't being built with -threaded πŸ˜…

I can resurrect that PR or similar if you've got a preference for whether or not this should be configurable. mapConcurrently will use all the concurrency it can get which is probably wrong. But then you don't like configuration :)

@michaelpj you can always configure the number of parallel tasks of a GHC program by passing +RTS -N2 -RTS (here only two parallel tasks) anyway, so I think you're good on the configuration front.

My intuition is that it'll be more efficient for Ormolu to do some parallelism itself because GHC's RTS takes a non-trivial time to load.

PS: for parsers, I hope that Happy parsers are thread safe, but if I remember correctly Yacc generates parsers with a global state and you can't actually use the same Yacc parser to parse to files in parallel. It does happen…