Section 2.3.6 Exercise 2 claims that tracemem will show two copies, but it only shows one on 4.1.1.
Sean1708 opened this issue · 4 comments
The exercise currently says:
1. Explain why `tracemem()` shows two copies when you run this code.
Hint: carefully look at the difference between this code and the code
shown earlier in the section.
```{r, results = FALSE}
x <- c(1L, 2L, 3L)
tracemem(x)
x[[3]] <- 4
```
However when I run exactly that code I only see one copy occurring:
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.1
> x <- c(1L, 2L, 3L)
> tracemem(x)
[1] "<0000000017882FC0>"
> x[[3]] <- 4
tracemem[0x0000000017882fc0 -> 0x000000000c6b5118]:
I suspect that something has changed in R since this section was written, or maybe there is some Windows quirk. Either way, the section should probably be updated to reflect the fact that different version might not copy twice.
Unless I've misunderstood what tracemem
actually does, in which case I think the explanation in that section should be updated. Right now it says:
From then on, whenever that object is copied, `tracemem()` will print a message telling you which object was copied, its new address, and the sequence of calls that led to the copy:
To me, this suggests that there would be a different line printed each time a copy occurs but maybe that's not the case?
I think your interpretation of tracemem is correct, though I'm not certain. Regardless, there are a few instances in that chapter where the number of copies observed does not line up with the text.
I'm guessing this is related to this change, from the R release notes (under the section for 4.0.0):
Reference counting is now used instead of the NAMED mechanism for determining when objects can be safely mutated in base C code. This reduces the need for copying in some cases and should allow further optimizations in the future. It should help make the internal code easier to maintain.
So the changes alluded to in footnote 15 have in fact occurred, as far as I can tell.
I see two copies on Windows R 4.2.3
> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lobstr_1.1.2
loaded via a namespace (and not attached):
[1] compiler_4.2.3 cli_3.6.1 tools_4.2.3 pillar_1.9.0 glue_1.6.2 rstudioapi_0.14
[7] crayon_1.5.2 utf8_1.2.3 fansi_1.0.4 vctrs_0.6.1 lifecycle_1.0.3 rlang_1.1.0
> x <- c(1L, 2L, 3L)
> tracemem(x)
[1] "<0000018898D9F7A8>"
> x[[3]] <- 4
tracemem[0x0000018898d9f7a8 -> 0x0000018898da2df8]:
tracemem[0x0000018898da2df8 -> 0x000001889ad39bc8]:
@jxu are you using RStudio? I get the two copies when using RStudio but not in a standard interactive session.
When exploring copy-on-modify behaviour interactively, be aware that you’ll get different results inside of RStudio. That’s because the environment pane must make a reference to each object in order to display information about it. This distorts your interactive exploration but doesn’t affect code inside of functions, and so doesn’t affect performance during data analysis. For experimentation, I recommend either running R directly from the terminal, or using RMarkdown (like this book).
Oh good catch; you're right I was using RStudio. In standard R I see one copy
> x <- c(1L, 2L, 3L)
> tracemem(x)
[1] "<000001B68F5953F8>"
> x[[3]] <- 4
tracemem[0x000001b68f5953f8 -> 0x000001b691395fe8]: