sahirbhatnagar/casebase

Do not import entire namespaces and remove ggplot2 dependency

Closed this issue · 2 comments

We are currently importing the entire ggplot2 namespace which is not advised as written in the ggplot2 vignette.

Even if you use many ggplot2 functions in your package, it is unwise to use ggplot2 in Depends or import the entire package into your NAMESPACE (e.g. with #' @import ggplot2). Using ggplot2 in Depends will attach ggplot2 when your package is attached, which includes when your package is tested. This makes it difficult to ensure that others can use the functions in your package without attaching it (i.e., using ::). Similarly, importing all 450 of ggplot2’s exported objects into your namespace makes it difficult to separate the responsibility of your package and the responsibility of ggplot2, in addition to making it difficult for readers of your code to figure out where functions are coming from!

We are also importing the entire data.table namespace. We should instead only import the functions we actually need. This is obviously more tedious to do, but worth it in the long run.

Here's a preliminary list. @turgeonmaxime @Jesse-Islam please add:

  • ggplot2

    1. plot.popTime (it is here where I am importing the entire namespace with #' @import ggplot2
    2. plot.absRiskCB
    3. theme_cb (non exported function)
    4. panelBorder (non exported function)
  • data.table

    1. popTime (it is here where I am importing the entire namespace with #' @import data.table
    2. check_arguments_hazard
    3. absoluteRisk (but this can easily be changed to data frames)
    4. Many examples and tests use data.table.

Given the advice from the ggplot2 vignette, I would say we should definitely remove import(ggplot2) from our NAMESPACE. I don't mind keeping it in Imports, because we do get quite a bit of functionality out of it. And in terms of which functions we import vs. which functions we explicitly call using ::, I don't have a strong preference. I guess the recommendation is to import as few functions as possible.

For data.table, we should also probably avoid import(data.table) in our NAMESPACE. We could also easily rewrite absoluteRisk and the examples without using it. So it may be possible to move it to Suggests and make the tests fail "gracefully" when data.table is not available. The question is: what about popTime and check_arguments_hazard?

One more comment about data.table: case-base sampling by design can create pretty big datasets, so it's probably a good thing to keep data.table around. In fact, there may be efficiency gains in re-writing some of the sampleCaseBase code to use data.table.

This was fixed in PR #126