beta coefficients explanation
Closed this issue · 2 comments
Hi
I'm just wondering what the following output variables are:
-
psuper_obj$glmnet_best$beta
? If I check this for the example datadata(acinar_hvg_sce)
, there are more rows than genes -
psuper_obj$beta_dt
all non-zero entries here are "time-varying" genes?
many thanks!
best
Mahmoud
Hey Mahmoud
That's a fair question :)
In psuper_obj$beta_dt
you are right that the non-zero entries are time-varying genes (these are genes that are useful for placing the cells in the label order).
The discrepancy in number of rows between glmnet_best$beta
and beta_dt
comes from the cunning way that glmnetcr
(the approach which inspired psupertime
) fits the ordinal regression model. Standard logistic or linear regression has a beta for every predictor, plus an intercept. Ordinal regression can be viewed as fitting k-1 simultaneous regressions, which share betas, but have different intercepts. So the extra rows in glmnet_best$beta
are these k-1 different intercepts.
You can check this:
## the rownames are genes, then cutpoints are cp1, cp2, etc
tail(rownames(psuper_obj$glmnet_best$beta), 10)
# > [1] "ZNF791" "ZNF823" "ZW10" "cp1" "cp2" "cp3" "cp4" "cp5"
# > [9] "cp6" "cp7"
## no genes + (k-1) should be no of rows in glmnet_best$beta
with(psuper_obj, nrow(beta_dt) + length(unique(y)) - 1 == nrow(glmnet_best$beta) )
# > TRUE
(glmnet_best
is the output from running glmnet
, and is a slightly complicated object. For deeper questions on that it's probably best to look at the documentation for glmnet
.)
Hope that makes things clearer!
Cheers
Will
Hi Will
thank you so much for the explanation and for pointing out the glmnet packages..! I will check out the documentation for those packages
best wishes
Mahmoud