sourceCpp crashes R when called about 1000 times on same code
klin333 opened this issue · 10 comments
When calling Rcpp::sourceCpp
about 1000 times on the same code, R crashes ("R for Windows terminal front-end has stopped working"). There is no user c function evaluation, ie not user c code's problem, see minimal example below. Same crash when compiling using the more user friend Rcpp::cppFunction
.
I see a previous mailing list discussion about this, but that thread seems to conclude there was a problem, but did not outline a solution. https://lists.r-forge.r-project.org/pipermail/rcpp-devel/2017-September/009755.html Anyhow this gives a more minimal example.
code <- "
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double f(double x){
return x;
}
"
cache_dir <- 'C:/Users/User/workspace/tmp123'
stopifnot(!dir.exists(cache_dir)) # ensure clean cache directory
for (i in seq(2000)) {
print(i)
f <- Rcpp::sourceCpp(code = code, cacheDir = cache_dir)
}
>
packageVersion('Rcpp')
[1] '1.0.11'
sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.3
C:\Users\User\workspace>where g++
C:\rtools40\mingw64\bin\g++.exe
C:\Users\User\workspace>g++ --version
g++ (Built by Jeroen for the R-project) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I was going to try the suggestion from the above thread, of using a different cacheDir per call to cppFunction
, but that prevents caching which is essential for most use cases. I can't even work around it by memoise::memoise(Rcpp::cppFunction)
because the pesky env
argument's hash will change over time. So the only work around is to call Rcpp::cppFunction
once and keep the resulting function as a global.
Can we start at the top please: why would you want to call file 1000 times?
We have been at this for quite some time. There are hundreds of issues here, over three thousand posts at StackOverflow and I-honestly-have-no-idea-how-many on the dedicated mailing list. Most question of the "I am driving cppFunction()
or sourceCpp()
hard and it breaks" type are a fundamental misunderstanding: If you need to do more than these functions offer, create a package.
PS You are also on R version that is two years old. It does not add credibility.
Does not reproduce.
Modiefied Code
edd@rob:/tmp/rcpp_issue_1279$ cat demo.cpp
#include <Rcpp/Lightest>
// [[Rcpp::export]]
double f(double x){
return x;
}
edd@rob:/tmp/rcpp_issue_1279$ cat caller.R
cache_dir <- "cache"
if (dir.exists(cache_dir)) unlink(cache_dir, recursive=TRUE)
if (!dir.exists(cache_dir)) dir.create(cache_dir)
for (i in seq(2000)) {
if (i %% 100 == 0) cat(i, " ")
f <- Rcpp::sourceCpp("demo.cpp", cacheDir = cache_dir)
}
cat("\nDone\n")
edd@rob:/tmp/rcpp_issue_1279$
Demo
edd@rob:/tmp/rcpp_issue_1279$ Rscript caller.R
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000
Done
edd@rob:/tmp/rcpp_issue_1279$
Of course, I use ccache
too. And I see no reason to turn it off given that you gave no real reason (yet?) why this loop makes sense.
Arh fair enough, perhaps a warning in the documentation of sourceCpp or cppFunction would be helpful? Not essentially.
Sorry for the trouble, really appreciate this package!
FYI as you asked for how this came up, this problem came up via using the einsum package (https://github.com/const-ae/einsum), which uses R code to generate cpp code on the fly, and calls cppFunction on the fly to create bespoke cpp functions based on user specified einstein summation spec, see code below.
I fully understand this is then a problem with packages and use cases downstream to Rcpp, and that is completely fine if package developers and users know about this problem. The reason why I thought it's fine to call cppFunction 1000 times with same cpp code, is that I thought this caching is done by cppFunction, ie I thought calling cppFunction a second time is completely free, so it made sense to not bother making the user R code harder by doing caching outside of Rcpp.
Also I'm on Windows, not sure if it's a Windows specific problem...
I'm more than happy to close this issue as it appears out of scope. I don't need this fixed at all, can easily workaround by simply not calling Rcpp::cppFunction 1000 times.
# innocent looking but crashes R when foo1 is executed 1000 times.
foo1 <- function(x) {
einsum_f <- einsum::einsum_generator('ijk->ik', compile_function = TRUE)
einsum_f(x)
}
# users require awareness of Rcpp::cppFunction shortcomings to do it this way
einsum_f <- einsum::einsum_generator('ijk->ik', compile_function = TRUE)
foo2 <- function(x) {
einsum_f(x)
}
FYI just to finish this thread, I believe this is due to dyn.load
which is called within Rcpp::sourceCpp
via source(scriptPath, local = env)
. The loop below will cause a R crash (the DLL is generated from same debug code as earlier).
From the help page of dyn.load, "By default, the maximum number of DLLs that can be loaded is now 614 when the OS limit on the number of open files allows or can be increased, but less otherwise". Probably that's why. Though, no idea why my loop crashes R around i = 1056
instead of 614.
for (i in seq(2000)) {
print(i)
`.sourceCpp_1_DLLInfo` <- dyn.load('C:/Users/User/workspace/tmp123/sourceCpp-x86_64-w64-mingw32-1.0.11/sourcecpp_169095e1a73/sourceCpp_2.dll')
}
Edit: confirmed it's due to dyn.load limits on maximum number of DLLs. If I run the above loop to i = 1055, ie 1 before the crash, and then library(dplyr)
or any other package that loads DLLs, R will crash.
I suppose if anyone ever gets bothered by this, conceptually you can skip dyn.load when build is not required. That should fix this problem, assuming it's a Windows reproducible problem not specific to me.
As I said, I'm not bothered by this, so happy for nothing to be done.
Lots of messages so a lot for me (or anybody else) to digest.
I know little about einsum
. But for example rstan
and alike have been doing just this for well over a decade, and it works, on all platforms. So if there is something that does not work with einsum
you need to distill it more.
I would suspect that Windows may have something to do with it. So if you can, try macOS or Linux too.
I thought calling cppFunction a second time is completely free
It is. If you try to compile the same function twice, the cached library is used (unless you force recompilation with rebuild=TRUE
). So if there is any issue, it's with the dyn.load
call.
But I cannot reproduce this. E.g., on Linux:
$ cat test.R
trace(dyn.load, quote(message(i, ": call to dyn.load")), print=FALSE)
for (i in 1:2000)
Rcpp::cppFunction("double f(double x){ return x; }")
$ time Rscript test.R
Tracing function "dyn.load" in package "base"
[1] "dyn.load"
1: call to dyn.load
...
2000: call to dyn.load
real 0m4.637s
user 0m4.301s
sys 0m0.364s
I tried exactly the same on Windows 10 with R 4.2.3, and it works the same, no crash.
Oh wow, upgrade to R 4.2.3 fixed it (Windows. never reproduced the problem on mac for me). I sincerely apologise for taking everyone's time.
Turned out this dyn.load crash existed at least since R 3.3.2 for Windows (https://stackoverflow.com/questions/47528881/why-does-calling-dyn-load-in-a-for-loop-crash-my-r-session).
I sincerely thank everyone for their time.
For completeness, I tried the following in the same Windows machine:
foo1 <- function(x) {
einsum_f <- einsum::einsum_generator('ijk->ik', compile_function = TRUE)
einsum_f(x)
}
for (i in 1:2000) foo1(array(c(1:18), c(3, 3, 2)))
No issue either. But yet again, I see no reason for doing this. foo2
is at least x1000 faster.