wlandau/crew

Reliable segfault in a test on Mac OS

Closed this issue · 4 comments

@shikokuchuo, I am rerunning all my local crew tests on Mac OS using shikokuchuo/mirai@c1dd3ff and shikokuchuo/nanonext@229e3c6. I am still really excited about shikokuchuo/mirai@3f15ead because it seems to solve host/dispatcher disconnection issues (except for this one instance).

I only found one issue, and it occurs in https://github.com/wlandau/crew/blob/main/tests/throughput/test-transient-wait.R. The code below is a slightly simplified version of the test. When I submit 100 tasks and wait for just one of them, a subsequent attempt to restart the host R session results in a crash, and the dispatcher keeps running indefinitely. Luckily, this time the crash always happens on my end, so I should be able to make this example simpler.

library(crew)
x <- crew_controller_local(
  name = "test",
  tasks_max = 1L,
  workers = 4L
)
x$start()
for (index in seq_len(100)) {
  x$push(command = Sys.sleep(10))
}
x$wait(mode = "one")
rstudioapi::restartSession() # segfaults here

Simplified down to this:

library(crew)
x <- crew_controller_local()
x$start()
for (index in seq_len(100)) {
  x$push(command = Sys.sleep(10))
}
x$wait(mode = "one")
rstudioapi::restartSession()

Even further:

library(crew)
x <- crew_controller_local()
x$start()
for (index in seq_len(3)) {
  x$push(command = Sys.sleep(1))
}
x$wait(mode = "one")
rstudioapi::restartSession()

Even simpler, and without crew:

mirai::daemons(n = 1L, url = "ws://127.0.0.1:5004", dispatcher = TRUE, token = FALSE)
tasks <- replicate(2L, mirai::mirai(TRUE))
Sys.sleep(1)
rstudioapi::restartSession()

I can reliably reproduce #104 (comment) on my local Ubuntu machine too. So it looks like a non-crew issue, at the level of mirai or below. I will file a new issue in the mirai repo.