simulate Q matrix with repeated runs from dirichlet distribution
Opened this issue · 2 comments
see http://www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/dirichlet.pdf and
https://en.wikipedia.org/wiki/Dirichlet_distribution
k <- 20
r <- 20
n_samples <- 100
# generate Q matrix by sampling from diricihlet
# keep alpha hyper param constant (same mean and variance for each K)
Q_1 <- MCMCpack::rdirichlet(n_samples, rep(1, k))
# permute labels by shuffling and adding gaussian noise noise over number of runs
permute_and_add_noise <- function(Q_mat, r) {
logit <- function(x) { log(x / (1-x)) }
perturb <- function(Q_mat) {
# add gaussian noise to each row of matrix
q_random <- apply(Q_mat, 1,
function(y) 1 / (1 + exp(-rnorm(length(y),
mean = logit(y),
sd = 0.01))))
# normalise rows
q_normalised <- q_random / rowSums(q_random)
# shuffle labels
q_normalised[, sample(seq_len(ncol(q_normalised)))]
}
replicate(r - 1, perturb(Q_mat))
}
permute_and_add_noise(Q_1, r)
Will have to fix labelling at some point but it is a start.
@gtonkinhill i think this might be reasonable for testing different label switching algs. should look into Stephens, 2000 as well.
As discussed the other day - these are the underlying problems here:
- Introducing noise to the Q-estimates (which the code in previous commit does do.)
- Introducing noise in the label switches
- Changing Dirichlet params to introduce multiple modes.
The suggestion by Melanie to solve 2. is to pass a confusion matrix as input for a sample being mislabelled. i.e. say we have a Q-matrix with correct labels then in each replicate we add noise to the Q and swap labels with some probability given by the confusion matrix.
Alternatively
The label.switching package has a function called permute.mcmc
that I think does what we want.