The result is NA

Question

The result is NA

Knight1995 opened this issue 10 months ago · 12 comments

Knight1995 commented 10 months ago

Thanks for the great job! I try to do some easy calculations on the FBM object, I want to get the 3774 rows' result,but the result is NA Could you please tell me what the problem is? Thanks!

Answer 1 · 2024-03-05T06:18:23.000Z

You should start from the warning messages. X.sub is probably shorter than K1[, 2].
Also, ind spans the column indices by default in big_apply(), but here you're using it for the rows.

Answer 2 · 2024-03-05T06:26:34.000Z

Thanks for your quick reply! Actually,I have edited my code to calculate each row's result,but the result is also NA.
colmeans <- big_apply(X1, ind = rows_along(X),function(X, ind) {
X.sub <- X[ind,1]

K1<-map_dfr(unique(X[,1]),function(i){
S1 <-mean(Y[which(X[,1]==i),1])
data.frame(Value=S1,clu=i)
})

a<-K1[which(K1[,2]==X.sub),1]
b<-min(K1[which(K1[,2]!=X.sub),1])
si=(b-a)/max(b,a)
return(si)
}, a.combine = 'c')

Answer 3 · 2024-03-05T06:51:51.000Z

Yes, cf. my first comment.

Answer 4 · 2024-03-05T09:21:38.000Z

Sorry for bothering again.When I test single numble (ind=1), my code works.But I put the code into the big_apply,the results are NA. What is the problem? Does the R algorithm not work in big_apply? Thanks.

Answer 5 · 2024-03-05T09:56:15.000Z

No, your code doesn't work when using ind <- 1.
It is just that X.sub is of length 1 and gets automatically recycled to match the size of K1[, 2].
Which is probably not what you want.

Answer 6 · 2024-03-05T09:58:49.000Z

You need to think about what you are trying to achieve here.
If I had to guess, I would say that you need to subset K1[ind, 2].

Answer 7 · 2024-03-05T13:33:42.000Z

Thanks for your reply. In order to find out the problem,i try a simple test as following.I think it may be that I didn't input one of the two variables, Y, so there is no result. But after I rewrite the code like your multivariate format (https://privefl.github.io/bigstatsr/articles/big-apply.html) , there is still no result output, which is very wired.Could you give me some suggestions? Thanks.

Answer 8 · 2024-03-05T13:45:06.000Z

Properly passing other variables as arguments of the function is necessary only when using parallelism.
Doing mean(Y[-ind, ]) is very odd (especially the minus). What are you trying to achieve here (in simple English)?
What do you have for summary(Y)?

Answer 9 · 2024-03-05T13:50:52.000Z

'ind' means the row number, mean(Y[-ind, ]) means that the matrix in this row will be removed, and the mean of new matrix will be calculated.
'Summary(Y)' shows as following.

Answer 10 · 2024-03-05T13:55:38.000Z

I didn't get that Y was also an FBM. Then summary(Y[]).
You understand that ind is usually a vector of multiple indices, not just one, right?
And you want the full mean() of the matrix? Not something like the rowMeans()?

Answer 11 · 2024-03-05T14:14:02.000Z

Yes, I probably understand what you mean. I tested the simple example above to know how to rewrite the a.FUN in big_apply step by step.My original R code is below. Because the matrix is too big and it runs too slowly, I want to realize this function by using big_apply.cluster_info and dist, which are the original matrix. Their row names and number of rows are the same.

K3<-future_map_dfr(seq(ncol(cluster_info)),function(Y){
  K2<-map_dfr(seq(nrow(cluster_info)),function(index){
    x <-cluster_info[,Y]
    dist2 <- as.data.frame(cbind(x,dist))[-index,]
    
    K1<-map_dfr(unique(x),function(i){
      d<-mean(dist2[which(dist2$x==i),index+1])  
      #d<-sum(dist2[ which(dist2$x==i),index+1])/length( which(dist2$x==i))
      
      data.frame(Value=d,clu=i) 
    })
    
    si <- (min(K1[K1$clu!=x[index],]$Value)-K1[K1$clu==x[index],]$Value)/max(min(K1[K1$clu!=x[index],]$Value),K1[K1$clu==x[index],]$Value)
    if(is.na(si)){
      data.frame(cluster=x[index],sil_width=0) 
    }else{
      data.frame(cluster=x[index],sil_width=si) 
    }
    
  })
  data.frame(Resolution=colnames(cluster_info)[Y],silhouette_score=mean(K2$sil_width))
})

Answer 12 · 2024-04-12T11:40:28.000Z

I don't get what you're trying to achieve here; sorry I cannot help.