chjackson/flexsurv

Enable `.` in formula

Closed this issue · 4 comments

I'd like to be able to use a formula like Surv(time, event) ~ . with the . specification. I was wondering if it's intentional that this isn't working.

If not: I've poked around and passed the data to the terms function and all tests are passing. Would you be open to a PR for that?

library(flexsurv)
#> Loading required package: survival

flexsurvreg(Surv(time, status) ~ ., data = lung, dist = "lognormal")
#> Error in terms.formula(formula): '.' in formula and no 'data' argument

Created on 2022-10-12 by the reprex package (v2.0.1)

It isn't deliberate - I was just unaware of this feature of formula syntax when I first wrote the package. It seems to have become more popular in recent years - I guess it's part of a statistics vs machine learning cultural change ("just stick everything in...")! Happy to accept a PR to put it in.

Also I guess it should include all main effects but no interactions? That seems to be what lm, glm etc do, but I don't know if it is documented anywhere that that is the expectation for modelling packages based on some kind of linear /additive model. I'm not sure what it means in ML type packages - is it more like "use all the variables in some kind of optimal way depending the package's algorithm"? I don't think it is necessarily "optimal" to include main effects but no interactions in linear models, it's just a sort of default / common practice.

Great, thanks! I promise it's more deliberate than "just stick everything in..."! 🤓 (I need it to get some tidymodels functionality to run where it's actually "stick everything in that's left at this point".)

terms() interprets the dots as referring "to the remaining variables contained in data" so that's typically the main effects but obviously depends what actually is in data.

I meant that, as I understand it, terms() interprets ~. as equivalent to ~ x + y in a dataset with variables x and y, when it could have interpreted it as ~ x*y. Which I think is making a judgement about what a sensible model should be. I don't mind flexsurv doing this, but I think it should be explicitly stated in the documentation.