sa-lee/starmie

simulate Q matrix with repeated runs from dirichlet distribution

Opened this issue · 2 comments

see http://www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/dirichlet.pdf and
https://en.wikipedia.org/wiki/Dirichlet_distribution

k <- 20
r <- 20
n_samples <- 100
# generate Q matrix by sampling from diricihlet
# keep alpha hyper param constant (same mean and variance for each K)
Q_1 <- MCMCpack::rdirichlet(n_samples, rep(1, k))
# permute labels by shuffling and adding gaussian noise noise over number of runs
permute_and_add_noise <- function(Q_mat, r) {
  logit <- function(x) { log(x / (1-x)) }
  perturb <- function(Q_mat) { 
    # add gaussian noise to each row of matrix
    q_random <- apply(Q_mat, 1, 
                      function(y) 1 / (1 + exp(-rnorm(length(y), 
                                                      mean = logit(y), 
                                                      sd = 0.01))))
    # normalise rows 
    q_normalised <- q_random / rowSums(q_random)
    # shuffle labels 
    q_normalised[, sample(seq_len(ncol(q_normalised)))]
  }

  replicate(r - 1, perturb(Q_mat))
}

permute_and_add_noise(Q_1, r)

Will have to fix labelling at some point but it is a start.

@gtonkinhill i think this might be reasonable for testing different label switching algs. should look into Stephens, 2000 as well.

As discussed the other day - these are the underlying problems here:

  1. Introducing noise to the Q-estimates (which the code in previous commit does do.)
  2. Introducing noise in the label switches
  3. Changing Dirichlet params to introduce multiple modes.

The suggestion by Melanie to solve 2. is to pass a confusion matrix as input for a sample being mislabelled. i.e. say we have a Q-matrix with correct labels then in each replicate we add noise to the Q and swap labels with some probability given by the confusion matrix.

Alternatively
The label.switching package has a function called permute.mcmc that I think does what we want.