fixed k-means for looking at best permutation
Closed this issue · 5 comments
sa-lee commented
better than CLUMPP possibly which has problem of multimodals
gtonkinhill commented
robbie has an idea for using a regression based approach. I think we should generate some test Q matrices to compare the approaches.
gtonkinhill commented
library(glmnet)
set.seed(10)
k <- 20
runs <- 20
x <- matrix(abs(rnorm(200*k,20,10)),200,k)
for(i in seq(0,190,k)){
diag(x[ (i+1):(i+k),] ) <- 150
}
x <- x/apply(x,1,sum)
noise1 <-0.01
x[,5] <- x[,1]+abs(rnorm(200,0, noise1))
x <- x/apply(x,1,sum)
colnames(x) <- paste("V",1:k,sep="")
mat <- x
set.seed(10)
noise <- 0.01
for(i in 2:runs){
a <- x[,order(sample(1:k,k))]+matrix(rnorm(200*k,0,noise),200,k)
a <- apply(a, 1:2, max ,0)
a <- a/apply(a,1,sum)
colnames(a) <- paste(colnames(a),i,sep="_")
mat <- cbind(mat,a)
}
try <- cv.glmnet(mat[,11:400],mat[,1],alpha=0.9)
try1 <- data.frame(var=rownames(coef(try)),beta=as.numeric(coef(try)))
try1[try1$beta>0,]
gtonkinhill commented
Start from the variable that gets regressed best, and work progressively down to the hardest to define clusters. If multiple clusters from the same run are assigned choose the one with the highest parameter and weight all others to zero ... repeat iteratively
sa-lee commented
not sure if i really get this. what do you mean by regressed best?
sa-lee commented
@gtonkinhill closing this for now since we've decided to go with correlation matrix method