sa-lee/starmie

fixed k-means for looking at best permutation

Closed this issue · 5 comments

better than CLUMPP possibly which has problem of multimodals

robbie has an idea for using a regression based approach. I think we should generate some test Q matrices to compare the approaches.

library(glmnet)

set.seed(10)
k <- 20
runs <- 20
x <- matrix(abs(rnorm(200*k,20,10)),200,k)

for(i in seq(0,190,k)){
  diag(x[ (i+1):(i+k),] ) <- 150
}
x <- x/apply(x,1,sum)

noise1 <-0.01
x[,5] <- x[,1]+abs(rnorm(200,0, noise1))
x <- x/apply(x,1,sum)

colnames(x) <- paste("V",1:k,sep="")
mat <- x
set.seed(10)
noise <- 0.01
for(i in 2:runs){
  a <- x[,order(sample(1:k,k))]+matrix(rnorm(200*k,0,noise),200,k)
  a <- apply(a, 1:2, max ,0)
  a <- a/apply(a,1,sum)
  colnames(a) <- paste(colnames(a),i,sep="_")
  mat <- cbind(mat,a)

}

try <- cv.glmnet(mat[,11:400],mat[,1],alpha=0.9)

try1 <- data.frame(var=rownames(coef(try)),beta=as.numeric(coef(try)))
try1[try1$beta>0,]

Start from the variable that gets regressed best, and work progressively down to the hardest to define clusters. If multiple clusters from the same run are assigned choose the one with the highest parameter and weight all others to zero ... repeat iteratively

not sure if i really get this. what do you mean by regressed best?

@gtonkinhill closing this for now since we've decided to go with correlation matrix method