RcppCore/Rcpp

Question regarding replacement by reference of <RTYPE>Array class without making copy

tony-aw opened this issue · 8 comments

Hi Dirk,

First of all: Fantastic R package! And thanks also for the thorough documentation!

I do have a question: I would like to replace values of a (3 dimensional) array by reference (i.e. without making a copy), in Rcpp, but I cannot find proper documentation on how to do so. I know how to do this with matrices, but not with arrays.

Here's what I've already looked at:

  • I know that you have a github repository called "Rcpp11" that includes an Array class defintion, but the ReadMe says it won't be uploaded on CRAN, and I don't feel comfortable "stealing" your work.
  • I know that (confusingly) there is another Rcpp11 package, which is on CRAN, by Romain Francois et al., but I couldn't find any proper documentation on Arrays on it (its pdf file is almost completely empty!). It also depends on c++11, and I'm not sure if that dependency is going to create problems for me (I'm writing an R package, though it's in the early stages).
  • I know that there is RcppArray, but if I understand the documentation correctly it always converts (and therefore copies) the array to std::array when put into its functions.
  • I know there are several xtensor based R packages, yet ALL of them have been removed from CRAN, though it is unclear to me why exactly.
  • I know arrays are vectors, and one can put a vector in an Rcpp function. However, I cannot specify 3 (or more) dimensional subscripts for such vectors, only flat/linear indices. I know how to efficiently convert between flat and dimensional indices and vice-versa, but even an efficient conversion still costs some computation time, and I prefer not to waste computation time on such conversions, if it can be done more directly in Rcpp.
  • I know there is RcppArmadillo, but it only supports Arrays of numeric rype. I need support for all major types (at the very least: logical, integer, double, and character).
  • And yes, I of course searched through Rcpp's own documentation. Unless I'm blind, I couldn't find proper documentation on Arrays, at least with respect to in-place replacement by reference of Arrays.
  • Finally, I tried to attempt to do it in somewhat similar fashion as can be done with matrices, but of course I failed.

Which leaves me to my question: does Rcpp, or any of its extension packages, support Arrays (or atleast 3d-arrays), which allows replacement of values by reference (thus without making copy)? If so, could you please be so kind to point me to the correct documentation?

Thanks in advance for your answer, and have a nice day!

Kind regards,

Tony

SEXP objects are containers with a pointer to the memory and are expected to hold it (and share in R's reference counting). That's the bad news. The good news is that you can do whatever you want with array types -- Rcpp's vector and matrix types are just 1d and 2d variants but you can go further. RcppArmadillo for example gives you cube (3d) and field (4d).

For zero-copy semantics you could look into ALTREP. It's documentation and usage are still not quit there yet though.

Hope that helps. Your question wasn't the most fococussed we have seen here.

Thanks for the super fast answer, I appreciate it!
What do you mean, my question is not the most focussed you have seen?

I have no idea what you are really asking, besides maybe wishing for a pony riding a rainbow ("I know Rcpp exists and has for years. Can you magically make zero copy appear for me?" "No we can't or else we would have done so."). You also go a little sideways by referencing at least four other repos (huh?, incl not noticing one is a fork of another). Context can be helpful, today I found it distracting.

Start with basics. R gives us SEXP. Contiguous vectors with a dimensions attribute.

Now: what is you want to do, what have tried, how did it fail, what surprised you about it, and how do you think we can help you?

Also look more closely at RcppArray and its std::array. It also support std::span (if you have C++20) which in newer C++ is being extended to std::mdspan which are the basis of linear algebra support (think LAPACK and BLAS) in modern C++. We do not have support for it, but that may be something to play with.

My needs may be simpler, and I often get what I need from RcppArmadillo. But as always, "it depends".

Sorry, allow me to re-phrase my question.

Suppose I have a character matrix in R, and I would like to change values of the matrix with rp, using "pass-by-reference" semantics. I can easily do that in Rcpp, for example:

//' @keywords internal
//' @noRd
// [[Rcpp::export(.rcpp_set_rowcol_String)]]
void rcpp_set_rowcol_String(CharacterMatrix x, IntegerVector rowind, IntegerVector colind, CharacterVector rp) {
  int ni = rowind.length();
  int nj = colind.length();
  int counter = 0;
  for(int j = 0; j < nj; ++j){
    CharacterMatrix::Column col = x(_, colind[j]);
    for(int i = 0; i < ni; ++i) {
      col[rowind[i]] = rp[counter];
      counter += 1;
    }
  }
}

I would like to do something similar as above, but with an array, with 3 dimensions. I don't know how to do that.
I am not asking you to spell it out for me right here. I was also not asking you to change Rcpp just for me. I was only asking you to point me in the right direction, like a link to a documentation where Array classes in Rcpp are explained.

If Rcpp does not provide similar functionality for Arrays: that's fine, I can work around it. I'm just trying to find the easy route before trying going straight for the hard route, that is all.

Btw: the repos were just there to indicate that I tried before asking; it wasn't meant as an insult or something.

Kind regards,

Tony

That's helpful, and I can see where it may be both tempting and also misleading an example. Character variables are stored separately by R (and Rcpp uses what R gives it) in a separate memory pool. Each 'string' or word is its own entry so that may make the 'reference' ops more natural.

Not so for all other atomic types. R uses vectors, ie chunks of contiguous memory, with a length. That is all there is. A matrix is the same thing, but it has a dim attribute telling us to index in 2d. And it is the same with all other dimensions can can pile up (which is rarer in statistics). And just because how this happens at the physical machine layer makes it very hard to anything fancy. (Ie every 'injection' in a vector of size n results in a new vector of size n+1 (I simplify slightly, R also overcommits to allow for easier appending.)

So sorry, all we have may be a bucket of bad news.

The Matrix type is just a convenience class around, as Dirk says, a chunk of contiguous memory. You could treat your array as a plain vector and replace the values as you did above, just that the calculation of indices is a bit more involved. You could easily create a simple class on top of that to do this for you.

Thanks.
It's fine. In that case I'll just go the flat indices route. Thanks again for your help. Feel free to close this issue :-)