hmatsu1226/HyPhyTree

parallel version

Opened this issue · 5 comments

The iteration through leaves:

        Z2inter <- matrix(0, Nall-Nleaves, M)
        for(n in 1:(Nall-Nleaves)){
            tmp <- optim(rep(0,M), objfn_H2, gr=NULL, method="BFGS",
                         Z2=Z2, Dall=Dall, Nleaves=Nleaves, n=n)
            Z2inter[n,] <- tmp$par
        }

        Z2inter_dist <- matrix(0, Nall, Nall)
        for(i in 1:Nall){
            for(j in 1:Nall){
                if(i > Nleaves){
                    tmpz1 <- Z2inter[i-Nleaves,]
                }else{
                    tmpz1 <- Z2[i,]
                }

                if(j > Nleaves){
                    tmpz2 <- Z2inter[j-Nleaves,]
                }else{
                    tmpz2 <- Z2[j,]
                }

                Z2inter_dist[i,j] <- Poincare_dist(tmpz1, tmpz2)
            }
        }

... is quite slow. Do you have a parallelized version that you could share?

As you pointed out, the computation is slow. Especially when the size of the phylogenetic tree is large, the current computation is not practical. I believe this is due to the fact that we are optimizing against all leaf nodes, etc., and parallelization will not improve things much. Rather, I believe that some kind of approximate calculation, such as comparing only some of the leaves, will greatly reduce the computational complexity.

However, I have not yet come up with a specific application that requires higher speed, so I have only done a very basic study and have not started to improve the software. If you have any ideas for applications, please let us know.

The optimizations are independent, correct? If so, then some speedup can be gained via parallelization.

Sorry, I misread that you mean parallel computing of optim function. Yes, you can optimize for each internal node independently. Unfortunately, I haven't gotten around to it and haven't implemented a parallelized version.

For now, I'm just parallelizing for multiple M values.

It appears that hydraPlus uses BLAS, which can default to all threads on the machine, so parallel runs can increase the CPU load beyond the resources available. Setting BLAS threads via RhpcBLASctl::blas_set_num_threads seems to fix that issue.

Thanks for the advice, I didn't know that.