Change signatures matrix orientation in fit_nmf
Closed this issue · 2 comments
baezortega commented
In order to make the models more consistent, I'm changing the orientation of the signatures matrix in sigfit_fit_nmf.stan, so that it is SxC, like in all other models, instead of CxS. This will imply changing the way in which the probabilities are calculated, from the current:
matrix[C, S] signatures; // in "data"
simplex[S] exposures[G]; // in "parameters"
vector<lower=0, upper=1>[C] probs[G]; // in "transformed parameters"
for (i in 1:G) {
probs[i] = scale_to_sum_1(signatures * exposures[i]);
}
To something like:
matrix[S, C] signatures; // in "data"
simplex[S] exposures[G]; // in "parameters"
vector<lower=0, upper=1>[C] probs[G]; // in "transformed parameters"
for (i in 1:G) {
probs[i] = scale_to_sum_1(exposures[i] * signatures); // need to find out how to do this product in stan
}
This way, the input and output matrices will always follow the form:
signatures: SxC
exposures: GxS
counts: GxC
opportunities: GxC
baezortega commented
I've implemented this as follows, based on sigfit_fitex_nmf.stan:
data {
...
matrix[S, C] signatures; // matrix of signatures (rows) to be fitted
int counts[G, C]; // data = counts per category (columns) per genome sample (rows)
...
}
parameters {
simplex[S] exposures[G];
}
transformed parameters {
matrix[G, S] exposures_mat;
matrix<lower=0, upper=1>[G, C] probs;
for (i in 1:G) {
for (j in 1:S) {
exposures_mat[i, j] = exposures[i, j];
}
}
probs = exposures_mat * signatures;
}
The signatures input for fitting will always be normalised to sum to 1, as this is done in the remove_zeros_() function.
kgori commented
This generated a compiler error when calculating the multinomial likelihood, which takes a vector not a row_vector. I've fixed this on master.
… On 5 Aug 2017, at 11:35, Adrian Baez-Ortega ***@***.***> wrote:
I've implemented this as follows, based on sigfit_fitex_nmf.stan:
data {
...
matrix[S, C] signatures; // matrix of signatures (rows) to be fitted
int counts[G, C]; // data = counts per category (columns) per genome sample (rows)
...
}
parameters {
simplex[S] exposures[G];
}
transformed parameters {
matrix[G, S] exposures_mat;
matrix<lower=0, upper=1>[G, C] probs;
for (i in 1:G) {
for (j in 1:S) {
exposures_mat[i, j] = exposures[i, j];
}
}
probs = exposures_mat * signatures;
}
The signatures input for fitting will always be normalised to sum to 1, as this is done in the remove_zeros_() function.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABkM_-hTDuIzFEpLmUmwOmGkmoT4bG4Pks5sVEVmgaJpZM4OubFz>.