nignatiadis/SmoothingSplines.jl

Interpretation of Lambda

Closed this issue · 3 comments

The predicted values for the example seem to be different from the predicted values for the same input to the smooth.spline function in R.
Here's how I've done the comparison:

using SmoothingSplines
using RDatasets

cars = dataset("datasets","cars")
X = map(Float64,convert(Array,cars[!, :Speed]))
Y = map(Float64,convert(Array,cars[!, :Dist]))

spl = fit(SmoothingSpline, X, Y, 250.0) # λ=250.0
Ypred = predict(spl) #

using RCall 
R"""
RSmoothingSpline <- function(x, y, lambda) 
{
    m      <- smooth.spline(x = x, y = y, lambda = lambda)
    return <- predict(m, x)$y
}
"""

YpredR = convert(Array{Float64}, rcall(:RSmoothingSpline,x = X, y = Y, lambda = 250))
# The answer is totally different from YPred. The answers are close if I pass lambda = 1/50 into the RSmoothingSpline

It seems the Lambda of the R function isn't the same as that of the Julia function. Is there a way to convert from one to another?

I am building the cross-validation feature for this package for the automatic selection of Lambda. I want to confirm the results with R using RCall. That's why it's important that for a given Lamda, the results are the same.

Some lambda discussion happened here #2, maybe of use?

I think the difference is that in the R smooth.spline function the predictor is first transformed to be in the unit interval. Thus, to get same results you could first standardize the predictor:

X = (X .- minimum(X))/(maximum(X)-minimum(X))

If you then repeat your code from above with this new X, then the same choice of lambda (say, 1/50) should yield the same answer using both this package and R's smooth.spline.

Alternatively, if you want to avoid the above transformation, the formula matching the lambdas to each other would be:

λR = λJ / ( maximum(X) - minimum(X) )^3

So e.g., in the example from the Readme, λ equal to 250 / 21^3 ≈ 0.027 for the R function should give the same result.

Thank you for the answer. I'll close the issue.