elixir-wallaby/wallaby

One chromedriver per core becomes problematic with sufficiently many cores

chriscrabtree opened this issue · 4 comments

Elixir and Erlang/OTP versions

Erlang/OTP 25 [erts-13.1.2] [source] [64-bit] [smp:56:56] [ds:56:56:10] [async-threads:1] [jit:ns]

Elixir 1.14.2 (compiled with Erlang/OTP 25)

Operating system

macOS 12.6

Browser

n/a

Driver

ChromeDriver

Correct Configuration

  • I confirm that I have Wallaby configured correctly.

Current behavior

To reproduce, run mix test on Wallaby itself.

On my 10-core MacBook Pro, tests run quickly (~11 seconds) and everything passes.

On my 56-core Mac Pro desktop, tests run very slowly (~46 seconds) with 7 failures and many timeout messages.

out.txt

Expected behavior

The Wallaby test suite should pass quietly on the 56-core Mac Pro.

Test Code & HTML

I fixed the issue locally by changing the partitions value in the init/1 function of lib/wallaby/chrome.ex.

Rather than partitions: System.schedulers_online(), I have it as:

partitions: reasonable_number_of_partitions(System.schedulers_online())

And then:

  defp reasonable_number_of_partitions(schedulers_online, divisor \\ 6)

  defp reasonable_number_of_partitions(schedulers_online, divisor)
       when is_integer(schedulers_online) and schedulers_online >= 56 and is_integer(divisor) do
    Integer.floor_div(schedulers_online, divisor)
  end

  defp reasonable_number_of_partitions(schedulers_online, _divisor)
       when is_integer(schedulers_online) do
    schedulers_online
  end

This maintains the current Wallaby behavior of using all cores unless there are at least 56 cores, in which case I experimentally estimated that one-sixth of the cores is the optimum for performance, at least on the Wallaby test suite.

I suspect that the 56 core threshold should actually be lower, but I don't have any machines between 10-core and 56-core to test that.

Configuration?

If we made some of these options configurable, then users could have an escape hatch if they were to run into a similar situation in the future.

Happy to help with a PR if we can land on a good approach for that configuration. I'm not so familiar with Wallaby internals that I know how we would want to make it more configurable in that regard.

A (Very) Incremental Improvement

And of course also very happy to PR this simple improvement first, which I believe will do no harm to most users, and will help the small fraction of folks developing on so many cores. :-)

Demonstration Project

No response

A PR with a configurable option would be great.

We can biekshed on the naming once a PR is open.

Do you mind sharing the types of errors you see? I'm wondering if opening that many chrome drivers is just using too many system resources.

The out.txt file attached above has stdout from mix test. There were more errors on stderr too that may be instructive--I'll see about capturing them.

I deliberately decided against diving too deeply into what chromedriver is doing, but it certainly feels like it's working harder than it should have to spinning up that many instances. Maybe there is some sort of internal threshold being exceeded. This particular Mac Pro has 1.5 TB of RAM, so I know it's not memory starved. And the Internet connection is fast and stable with low latency, so I mentally ruled out its waiting on the network too.

Anyway, I'll get a PR going this week and we can take it from there. Thanks!

Here is the output of mix test, including both stdout and stderr. Not sure if it points us to a different conclusion than hey, that's too many!, but there it is. :-)

out2.txt

For completeness, the associated PR.