fixed k-means for looking at best permutation

Question

fixed k-means for looking at best permutation

Closed this issue 8 years ago · 5 comments

better than CLUMPP possibly which has problem of multimodals

Answer 1 · 2016-06-16T05:29:26.000Z

robbie has an idea for using a regression based approach. I think we should generate some test Q matrices to compare the approaches.

Answer 2 · 2016-06-16T07:22:44.000Z

library(glmnet)

set.seed(10)
k <- 20
runs <- 20
x <- matrix(abs(rnorm(200*k,20,10)),200,k)

for(i in seq(0,190,k)){
  diag(x[ (i+1):(i+k),] ) <- 150
}
x <- x/apply(x,1,sum)

noise1 <-0.01
x[,5] <- x[,1]+abs(rnorm(200,0, noise1))
x <- x/apply(x,1,sum)

colnames(x) <- paste("V",1:k,sep="")
mat <- x
set.seed(10)
noise <- 0.01
for(i in 2:runs){
  a <- x[,order(sample(1:k,k))]+matrix(rnorm(200*k,0,noise),200,k)
  a <- apply(a, 1:2, max ,0)
  a <- a/apply(a,1,sum)
  colnames(a) <- paste(colnames(a),i,sep="_")
  mat <- cbind(mat,a)

}

try <- cv.glmnet(mat[,11:400],mat[,1],alpha=0.9)

try1 <- data.frame(var=rownames(coef(try)),beta=as.numeric(coef(try)))
try1[try1$beta>0,]

Answer 3 · 2016-06-16T07:24:21.000Z

Start from the variable that gets regressed best, and work progressively down to the hardest to define clusters. If multiple clusters from the same run are assigned choose the one with the highest parameter and weight all others to zero ... repeat iteratively

Answer 4 · 2016-06-17T00:23:17.000Z

not sure if i really get this. what do you mean by regressed best?

Answer 5 · 2016-07-26T03:03:57.000Z

@gtonkinhill closing this for now since we've decided to go with correlation matrix method