kgori/sigfit

Change signatures matrix orientation in fit_nmf

Closed this issue · 2 comments

In order to make the models more consistent, I'm changing the orientation of the signatures matrix in sigfit_fit_nmf.stan, so that it is SxC, like in all other models, instead of CxS. This will imply changing the way in which the probabilities are calculated, from the current:

matrix[C, S] signatures;  // in "data"
simplex[S] exposures[G];  // in "parameters"
vector<lower=0, upper=1>[C] probs[G];  // in "transformed parameters"
    for (i in 1:G) {
        probs[i] = scale_to_sum_1(signatures * exposures[i]);
}

To something like:

matrix[S, C] signatures;  // in "data"
simplex[S] exposures[G];  // in "parameters"
vector<lower=0, upper=1>[C] probs[G];  // in "transformed parameters"
    for (i in 1:G) {
        probs[i] = scale_to_sum_1(exposures[i] * signatures); // need to find out how to do this product in stan
}

This way, the input and output matrices will always follow the form:

signatures: SxC
exposures: GxS
counts: GxC
opportunities: GxC

I've implemented this as follows, based on sigfit_fitex_nmf.stan:

data {
    ...
    matrix[S, C] signatures;  // matrix of signatures (rows) to be fitted
    int counts[G, C];         // data = counts per category (columns) per genome sample (rows)
    ...
}
parameters {
    simplex[S] exposures[G];
}
transformed parameters {
    matrix[G, S] exposures_mat;
    matrix<lower=0, upper=1>[G, C] probs;
    for (i in 1:G) {
        for (j in 1:S) {
            exposures_mat[i, j] = exposures[i, j];
        }
    }
    probs = exposures_mat * signatures;
}

The signatures input for fitting will always be normalised to sum to 1, as this is done in the remove_zeros_() function.

kgori commented