beta coefficients explanation

Question

beta coefficients explanation

Closed this issue 5 years ago · 2 comments

mahmoudibrahim commented 5 years ago

Hi

I'm just wondering what the following output variables are:

psuper_obj$glmnet_best$beta ? If I check this for the example data data(acinar_hvg_sce), there are more rows than genes
psuper_obj$beta_dt all non-zero entries here are "time-varying" genes?

many thanks!
best
Mahmoud

Answer 1 · 2019-11-20T10:42:14.000Z

Hey Mahmoud

That's a fair question :)

In psuper_obj$beta_dt you are right that the non-zero entries are time-varying genes (these are genes that are useful for placing the cells in the label order).

The discrepancy in number of rows between glmnet_best$beta and beta_dt comes from the cunning way that glmnetcr (the approach which inspired psupertime) fits the ordinal regression model. Standard logistic or linear regression has a beta for every predictor, plus an intercept. Ordinal regression can be viewed as fitting k-1 simultaneous regressions, which share betas, but have different intercepts. So the extra rows in glmnet_best$beta are these k-1 different intercepts.

You can check this:

## the rownames are genes, then cutpoints are cp1, cp2, etc
tail(rownames(psuper_obj$glmnet_best$beta), 10)
# > [1] "ZNF791" "ZNF823" "ZW10"   "cp1"    "cp2"    "cp3"    "cp4"    "cp5"
# > [9] "cp6"    "cp7"

## no genes + (k-1) should be no of rows in glmnet_best$beta
with(psuper_obj, nrow(beta_dt) + length(unique(y)) - 1 == nrow(glmnet_best$beta) )
# > TRUE

(glmnet_best is the output from running glmnet, and is a slightly complicated object. For deeper questions on that it's probably best to look at the documentation for glmnet.)

Hope that makes things clearer!
Cheers
Will

Answer 2 · 2019-11-20T13:11:53.000Z

Hi Will

thank you so much for the explanation and for pointing out the glmnet packages..! I will check out the documentation for those packages

best wishes
Mahmoud