This script (very much under construction) implements Kernel Regularized Least Squares (paper here) in the Julia language. While much more is on the way, it is currently mostly a transliteration from the R package (found here). It is also currently about 10 times faster!
Required
Xinit
- your data matrix, observations are the first dimension, features are the second dimensionyinit
- your response vector
Optional
lambda
- default is to fit using leave-one-out-cross-validation, but the user can specify any number greater than 0
The method only outputs one object, a KRLS
object, from which the following can be retrieved:
K
- the kernel matrixcoeffs
- the choice coefficientsLooe
- the final leave-one-out errorfitted
- fittedy
valuesX
- originalX
matrixy
- originaly
vectorsigma
lambda
R2
- R-squaredderivatives
- the pointwise marginal effectsavgderivatives
- the average pointwise marginal effectsvar_avgderivatives
- variance of the average pointwise marginal effectsvcov_c
- variance covariance matrix of the choice coefficientsvcov_fitted
- variance covariance matrix of the fitted values
This can be used to predict outcomes for new data given a new data matrix with the same number of columns.
k
- a KRLS objectnewmatint
a new data matrix with the same number of columns
Returns a 3-tuple with the following objects (this will be changed soon):
yfitted
- predictedy
valuessefit
- standard errors of the predicted valuesvcovfit
- variance covariance matrix of the predicted values
X = randn(1000, 3)
X = hcat(X, vcat(repmat([1], 500, 1), repmat([0], 500, 1)))
y = X * [1,2,3, -2] + randn(1000)
k = krls(X, y)
KRLS results
------------------------------------
Average Marginal Effects:
1x4 Array{Float64,2}:
1.0059 1.9553 2.8991 -1.9689
Quartiles of Marginal Effects
Var 1: [0.828,1.0257,1.1908]
Var 2: [1.7346,2.0057,2.2355]
Var 3: [2.6737,2.91,3.1563]
Var 4: [-2.207,-2.0203,-1.7358]
k.coeffs
1000-element Array{Float64,1}:
-2.10438
2.16206
1.32769
2.9391
-2.04586
-3.5121
-2.07879
3.32383
2.01171
1.10001
-2.06835
-4.77452
0.00034328
⋮
-2.5586
-1.11627
-1.32929
1.04863
-1.33211
2.51199
1.45449
2.41898
-1.9107
1.75867
-2.39654
-0.205217
hcat(y, k.fitted)
1000x2 Array{Float64,2}:
-6.28847 -5.4086
-2.36659 -3.27057
0.206222 -0.348901
-1.76807 -2.99694
-0.361033 0.494366
-4.45088 -2.98243
-1.57046 -0.701293
1.97831 0.588583
3.69702 2.8559
-6.09572 -6.55565
-3.53623 -2.67143
0.304973 2.30125
-4.1711 -4.17125
⋮
5.04699 6.11677
5.15348 5.62021
-5.28853 -4.73274
6.66882 6.23037
1.31033 1.8673
-2.07486 -3.12515
0.478735 -0.129404
3.381 2.3696
-0.208332 0.590553
6.1462 5.41089
-3.31874 -2.31672
-4.0134 -3.9276
Later I will include more formal speed tests comparing this with KRLS in R and Stata. For now, this (mostly) transliteration of the R package results in speed improvements of 10-20 times.
- Input validation, more user feedback
- Column names, possibly use DataFrames
- Methods for interpretation
- Cleaning up code
- Speed improvements