SteffenMoritz/ridge

predict.linearRidge does not work with factors

dfrankow opened this issue · 17 comments

Here is an example:

library(ridge)

foo <- mtcars
foo$cyl_factor <- as.factor(paste0("cyl", foo$cyl))
model1 <- lm(mpg ~ wt + cyl_factor, data = foo)
model2 <- linearRidge(mpg ~ wt + cyl_factor, data = foo, lambda = 0.0)

predict(model1)
# has output..
predict(model2)
Error in as.matrix(mm) %*% beta : 
  requires numeric/complex matrix/vector arguments

It's because cyl_factor breaks out into multiple columns, while mm has only one column for that factor. In predict.linearRidge, right before res <- drop(as.matrix(mm) %*% beta:

Browse[2]> names(mm)
[1] "1"          "wt"         "cyl_factor"
Browse[2]> names(beta)
[1] "(Intercept)"    "wt"             "cyl_factorcyl6" "cyl_factorcyl8"

Thanks for opening this issue @dfrankow 👍
It's been a while since I have been looking into the package.

It sounds like you have given this some thought - do you already have a reasonable fix in mind? - which ideally does not break something else ;) (can this even be fixed?)

Should you have a fix already, I'd very much appreciate it if you can open a pull request. I'd merge on Github and update the CRAN version afterwards. (of course you can add yourself to the contributors in the package description file for any updates you make)

I have no fix in mind.

I'll try to take a look at predict.lm or vcov.lm to see how it handles factors.

I believe I have a fix in my repo. I also added vcov.linearRidge and some testthat tests. However, I want to get it reviewed before making a pull request.

Filed cran/ridge#1 for this, but I want my colleague to review it first.

Ah this would be great, if your colleague also has a look over it!

Small hint, you must file the Pull request to https://github.com/SteffenMoritz/ridge/
(the cran/ridge is read only)

Then I have all the changes and can package it and submit it to CRAN.
(https://cran.r-project.org/submit.html)

Then CRAN updates the cran/ridge repo and more important all other download possibilities.

Additional note:
In the DESCRIPTION file not only change the version number, also add yourself to
Authors@R + Author to also appear as contributor on CRAN (https://cran.r-project.org/web/packages/ridge/index.html)

Like this (but with FIRST and LAST replaced by your name :) ):

Authors@R:
c(
person("Steffen", "Moritz", email="steffen.moritz10@gmail.com", role=c("aut", "cre"), comment = c(ORCID = "0000-0002-0085-1804")),person("Erika", "Cule", role=c("aut")), person("FIRST","LAST",role=c("ctb")
)

Author: Steffen Moritz [aut, cre] (https://orcid.org/0000-0002-0085-1804), Erika Cule [aut], FIRST LAST [ctb]

I filed #4 for this.

Looks good, thanks for your effort !
I took a quick look and merged this now.
I'll do some further tests tonight and then I'll update on CRAN :)

Sounds good.

If you have time, I'd suggest putting tests you do in the testthat tests to be run automatically.

Even after #4 is merged, I found another test that fails: using predict with newdata.

test_that("test linearRidge predict method for lambda = 0, with a factor and newdata", {
  foo <- mtcars
  foo$cyl_factor <- as.factor(paste0("cyl", foo$cyl))
  model1 <- lm(mpg ~ wt + cyl_factor, data = foo)
  model2 <- linearRidge(mpg ~ wt + cyl_factor, data = foo, lambda = 0.0)

  # predict agrees between model1 and model2 when lambda is 0, with factor
  newdata <- data.frame(wt=c(1.0), cyl_factor=c("cyl4"))
  preds1 <- predict(model1, newdata)
  preds2 <- predict(model2, newdata)
  expect_equal(preds1, preds2, tolerance = tol, label = "predictions")
})

I'm working on it.

Filed #6

Found another case that doesn't work: passing in a formula object.

test_that("test linearRidge with formula variable", {
  the_formula <- formula('mpg ~ wt + cyl')
  model1 <- lm(the_formula, data = mtcars)
  model2 <- linearRidge(the_formula, data = mtcars, lambda = 0)
  
  expect_equal(coef(model1), coef(model2), tolerance = tol, label = "coefficients")
  expect_equal(predict(model1), predict(model2), tolerance = tol, label = "predictions")
})

This doesn't have to do with factors, but it may have to do with my modifications. I'm not sure yet.

Never mind, false alarm. I had interfering objects in my environment.

I merged everything related to this issue now :)
Unfortunately I've had some issues with my Mac - had to reinstall xcode to be able to build the package.

So I've not been able to run tests yet (by the way adding these to testthat is a very good remark)

Good news: build is successful for me.
0 errors ✔ | 0 warnings ✔ | 1 note ✖

Upload to CRAN usually requires 0 notes.
But I think we should be able to fix this quite easy.

Here is the note I am getting (on Mac OS):
(but it is the same on travis-ci https://travis-ci.org/github/SteffenMoritz/ridge)


  • checking R code for possible problems ... NOTE
    linearRidge: no visible global function definition for ‘.getXlevels’
    vcov.ridgeLinear: no visible global function definition for ‘predict’
    Undefined global functions or variables:
    .getXlevels predict
    Consider adding
    importFrom("stats", ".getXlevels", "predict")
    to your NAMESPACE file.

Luckily the solution is already in the note :)
(we can just manually add this to the NAMESPACE file, since for this package the NAMESPACE file is not created by roxygen)

By the way, I submitted the new version to CRAN now.
Could be already online in a few hours - if everything goes alright.

Many thanks to you for this update of ridge!
( it was basically all made by you 👍 )

Thanks. Your responsiveness was very helpful.

Since this now works, I'll close this issue.

Also, this code runs:

library(testthat)
expect_equal(predict(model1), predict(model2))
expect_equal(coef(model1), coef(model2))

Dear maintainer,

thanks, package ridge_2.5.tar.gz is on its way to CRAN.

Best regards,
CRAN teams' auto-check service
Package check result: OK

No changes to worse in reverse depends.

Everything successful :)
Some of the new versions are already build:
https://cran.r-project.org/web/checks/check_results_ridge.html