wmacnair/psupertime

beta coefficients explanation

Closed this issue · 2 comments

Hi

I'm just wondering what the following output variables are:

  • psuper_obj$glmnet_best$beta ? If I check this for the example data data(acinar_hvg_sce), there are more rows than genes

  • psuper_obj$beta_dt all non-zero entries here are "time-varying" genes?

many thanks!
best
Mahmoud

Hey Mahmoud

That's a fair question :)

In psuper_obj$beta_dt you are right that the non-zero entries are time-varying genes (these are genes that are useful for placing the cells in the label order).

The discrepancy in number of rows between glmnet_best$beta and beta_dt comes from the cunning way that glmnetcr (the approach which inspired psupertime) fits the ordinal regression model. Standard logistic or linear regression has a beta for every predictor, plus an intercept. Ordinal regression can be viewed as fitting k-1 simultaneous regressions, which share betas, but have different intercepts. So the extra rows in glmnet_best$beta are these k-1 different intercepts.

You can check this:

## the rownames are genes, then cutpoints are cp1, cp2, etc
tail(rownames(psuper_obj$glmnet_best$beta), 10)
# > [1] "ZNF791" "ZNF823" "ZW10"   "cp1"    "cp2"    "cp3"    "cp4"    "cp5"
# > [9] "cp6"    "cp7"

## no genes + (k-1) should be no of rows in glmnet_best$beta
with(psuper_obj, nrow(beta_dt) + length(unique(y)) - 1 == nrow(glmnet_best$beta) )
# > TRUE

(glmnet_best is the output from running glmnet, and is a slightly complicated object. For deeper questions on that it's probably best to look at the documentation for glmnet.)

Hope that makes things clearer!
Cheers
Will

Hi Will

thank you so much for the explanation and for pointing out the glmnet packages..! I will check out the documentation for those packages

best wishes
Mahmoud