DeclareDesign/fabricatr

Importing data and using add_level has weird syntax.

Opened this issue · 7 comments

Current CRAN version

# Example pre-existing data
df = data.frame(x = rnorm(100, 10, 2), y = 1:100)

# Works, correctly:
fabricate(
    df,
    z = x - dplyr::lag(x, 1)
)

# Does not work: Data doesn't import
fabricate(
  times = add_level(
    data = df,
    z = x - dplyr::lag(x, 1),
  )
)

# Does not work: needs an N
fabricate(
  data = df,
  times = add_level(
    z = x - dplyr::lag(x, 1),
  )
)

The error is that N resolves to NULL, which is invalid, but you can see in debug that earlier than that, the data doesn't import. This is because add_level isn't designed to bring in data.

You can do it this way:

fabricate(
  data = df,
  times = add_level(
    N = 1,
    z = x - dplyr:::lag(x, 1)
  )
)

Basically, nest a level "under" the data, but make it 1 obs for every imported obs, so the level is largely illusory. This is pretty awkward syntax, and it also runs into trouble when you want to, for instance, import DF1 as level A, then mutate it to add variables, then import DF2 as non-nested level B, then mutate THAT to add variables.

This is as much a note to self as anything, but basically:

  • Documentation should have an example of the current cumbersome syntax.
  • Eventually, it should be easier to import data within an add_level(nest = FALSE) command.

I think what you want is:

fabricate(list(df=df), df=modify_level(z=x-dplyr::lag(x,1)))

To import a data frame in "levels mode" (needs a better name), it ought to be named. Also, you want to modify the level (specified by the lhs), not add a new one.

I could imagine an import_level function, which might make it match the rest of the package better and improve findability in the man pages.

Not sure I get it. Isn't your first example what you want anyway to import data at the top level?

fabricate(
    df,
    z = x - dplyr::lag(x, 1)
)

I'd vote to keep simple and add more informative errors.

Oh I see sorry. Agree we want this:

fabricate(
  data = df,
  times = add_level(
    z = x - dplyr::lag(x, 1),
  )
)

I like import_level now that I understand

agree that we need these examples in docs

Yeah my actual use case here involved making two non-nested levels, one of which relied on imported data and one of which didn't (basically ex = fabricate(data, A = modify_level(var = data_var * 2), B = add_level(..., nest = FALSE), obs = cross_levels(...), so simple mode was not suitable, but in my MWE I abstracted that away.

Agree with @nfultz that my example above of doing a 1 unit nest is probably better done as modify_level even under the current syntax. Just one of those things where when I was writing it, my brain didn't think of it as "modifying a level" because of the ambiguity about what the current working environment is at any stage, but Neal's way is definitely the way to go. But of course in the simple mode, modify_level is what's being done implicitly after the import.

Also agree with @nfultz there could probably be an explicit import_level function to make the syntax a little clearer.