kaskr/adcomp

Compiling both TMB and Rcpp in distributed package and CRAN checks

AdrianHordyk opened this issue · 11 comments

I posted a question over on Stack Overflow describing an issue with compiling both Rcpp and TMB and passing CRAN checks.

I can compile both TMB and Rcpp code into two DLLs using a makevars file, but results in a CRAN check error about foreign function calls.

Compiling into a single DLL (mypackage.dll) results in:

Error in .Call("getParameterOrder", data, parameters, new.env(), PACKAGE = DLL) : 
"getParameterOrder" not available for .Call() for package "mypackage"

using TMB::MakeADFun(..., DLL="mypackage", checkParameterOrder=FALSE) results in:

Error in .Call("TMBconfig", e, as.integer(1), PACKAGE = DLL) : 
  "TMBconfig" not available for .Call() for package "mypackage"

Probably something simple I'm missing, but would appreciate any advice.

kaskr commented

Assuming the DLL is loaded, could it be that you forgot to register the routines

Routines are registered in TMB by including #define TMB_LIB_INIT R_init_mypkg before the TMB.hpp header - see here

When compiling into a single mypackage.dll, the DLL is loaded and all Rcpp functions work, but TMB function doesn't. Adding #define TMB_LIB_INIT R_init_mypackage results in a build error:
RcppExports.o:RcppExports.cpp:(.text+0x1c0): multiple definition of R_init_mypackage'`

Both Rcpp and TMB function work when building two separate DLLs using makevars for TMB function. I've now tried including: #define TMB_LIB_INIT R_init_myTMB but still results in the same CRAN check error about foreign function calls for my Rcpp functions.

kaskr commented

The multiple definition error is expected since both Rcpp and TMB handles the registration and initialization automatically.
In order to build a single DLL you'll have to merge the Rcpp and TMB calldef tables manually and reduce to a single init function. Here's how glmmmTMB expands the TMB calldef table. In your case you could insert TMB_CALLDEFS into the RcppExports calldefs.

However, there could be other conflicts than those related to calldef tables. If you can get away with it, it seems easier to maintain two separete DLLs. Make sure to use the CRAN version of TMB (not the github version!) while checking.

mlysy commented

I have created a simple test package containing Rcpp + TMB code here, and checked that it passes R CMD check --as-cran on Unix/OSX and Windows (win builder) machines.

I basically followed the Wiki's instructions for distributing code and those for compiling multiple TMB files here from #43 and in install.libs.R from #249. A few things I noted were:

  • The install.libs.R solution requires the deletion of TMB .so/.dll and .o files to avoid CRAN complaints. To avoid recompiling the TMB libraries when their source code does not change, I believe these commands are best executed in a clean rule in Makevars[.win].

  • The package and TMB shared libraries each require a NAMESPACE call to useDynLib. However, I found that to avoid CRAN Foreign function call notes it was necessary that the package's useDynLib be issued first. For those of us using rdoxygen2 for documentation, the useDynLib statements appear to be issued in alphabetical order. The test package shows how to override this using the @rawNamespace tag.

  • The package is an example of having two TMB models in separate .cpp files. However, it has been noted in #233 that this considerably increases the size of the CRAN binary, with a workaround consisting of a single TMB objective_function<Type>::operator() deploying multiple models described here.

    Question: Are there any downsides to having multiple TMB models in a single file (except perhaps legibility and simplicity of the call to TMB::MakeADFun)? For example, would this increase the tape length / adversely affect computation speed / etc relative to a single model per .cpp file?

kaskr commented

@mlysy There shouldn't be any side effects with your approach in terms of speed. The internal calculations are identical whether the models are compiled separately or as a single unit.
BTW, the test package seems really useful - thanks for creating it.

mlysy commented

@kaskr Thank you for the quick response. I take it then that a single TMB .cpp file really is preferable, so I'll modify the test package accordingly.

mlysy commented

I've now implemented the multiple model / single TMB file approach suggested here. I tried to set it up such that developers can write ModelA.hpp, ModelB.hpp, etc. almost exactly as they would for separate TMB model files ModelA.cpp, ModelB.cpp, etc.

To my surprise/delight, it seems that the main file TMBMain.cpp which controls the switching between models does not get confused if e.g., you have DATA_VECTOR(x) in ModelA but PARAMETER_MATRIX(x) in ModelB, or if you have variables x and theta in ModelA, but y and phi in ModelB, there's no need to specify all 4 variables to TMB::MakeADFun.

In short, multiple models/single TMB file is easy to implement, as fast as multiple TMB files, and takes up much less space. So unless I'm missing something, I believe this ought to be the way to include multiple TMB models into an R package.

mlysy commented

@kaskr I've created the TMBtools package for developers to easily create R packages containing TMB source code. Would be delighted to distribute via TMB_contrib_R if you are interested.

kaskr commented

@mlysy Thanks! I've send you an invitation to make changes to TMB_contrib.

mlysy commented

Wonderful! Do you have a preference as to whether I should include the package as a submodule, or make a hard copy in the repository?

kaskr commented

@mlysy I don't use submodules but please do whatever is easiest for you.