markvanderloo/stringdist

parallel::parLapply and nthreades > 1

Closed this issue · 5 comments

In certain cases stringdist just hangs forever if invoked inside parallel::parLapply with argument nthreads > 1 or no nthreads at all:

cl <- parallel::makeForkCluster(2)
parallel::parLapply(cl, list(c("a", "b"), c("d", "e")), function(x) stringdist::stringdist("a", x))

just hangs forever, it works when adding the nthreads=1 argument. (But seems like I have to stop the cluster before I can get it to work again).

I guess this is very much system-dependent, I noticed it on ubuntu 18.04 and R 3.5.3 (see below). I am also not sure if it is a problem with your code, openMP, core R or something else.

Thanks for the package :-)

> R.Version()
$platform
[1] "x86_64-pc-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "3"

$minor
[1] "5.3"

$year
[1] "2019"

$month
[1] "03"

$day
[1] "11"

$`svn rev`
[1] "76217"

$language
[1] "R"

$version.string
[1] "R version 3.5.3 (2019-03-11)"

$nickname
[1] "Great Truth"

Hi, thanks for the report. The code you submit works fine for me on Ubuntu 16.04 (will try later on 18.04).

Contents of sd.R

cl <- parallel::makeForkCluster(2)
parallel::parLapply(cl, list(c("a", "b"), c("d", "e")), function(x) stringdist::stringdist("a", x))
sessionInfo()
mark@chouffe:~$ R -s -f sd.R 
[[1]]
[1] 0 1

[[2]]
[1] 1 1

R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.5.3 parallel_3.5.3

Question though: why would you parallelize something that is already running in parallel? It seems to only make sense when paralellizing over multiple machines.

Thanks for the reply!
I agree that parallelizing something inside a parallel loop is tricky. But it may still make sense if I choose to run only little parallelism in the outer loop and want to add some more inside. But more importantly: my code that worked fine earlier, suddenly just hang there and it took and hour to find the culprit. I guess this was related to upgrading R 3.4 -> 3.5 but not sure any more. I try to check on different platforms and see where I can replicate it.

I guess it is not directly related to your code but something else, maybe the way how a particular gcc implements openMP or whatever...

Hard to make sure what it is, but please keep me in the loop if you find something. I like stringdist to be safe&fast.

Works correctly (when tested interactively):

  • R 3.5.2, ubuntu 16.04, gcc 5.4.0, kernel 4.15.0 (6/12 cores)
  • R 3.5.2, debian sid, gcc 8.3.0 , kernel 4.19.0 (20/40 cores)
  • R 3.5.2, centos 6.10, gcc 4.4.7, kernel 2.6.32 (8 cores)

Sometimes works, sometimes not:

  • R 3.5.3, ubuntu 18.04, gcc 7.3.0, kernel 4.15.0 (4/8 cores)
    (it works in two interactive sessions and fails in another two...)
    Repeatedly failed in a script -- needs further testing

Hi, I'm going to close this now, but perhaps it is good to quote the following from ?makeForkCluster, as a reference.

It is strongly discouraged to use the ‘"FORK"’ cluster with GUI
front-ends or multi-threaded libraries. See ‘mcfork’ for details.