parallel version

Question

parallel version

Opened this issue 3 years ago · 5 comments

The iteration through leaves:

        Z2inter <- matrix(0, Nall-Nleaves, M)
        for(n in 1:(Nall-Nleaves)){
            tmp <- optim(rep(0,M), objfn_H2, gr=NULL, method="BFGS",
                         Z2=Z2, Dall=Dall, Nleaves=Nleaves, n=n)
            Z2inter[n,] <- tmp$par
        }

        Z2inter_dist <- matrix(0, Nall, Nall)
        for(i in 1:Nall){
            for(j in 1:Nall){
                if(i > Nleaves){
                    tmpz1 <- Z2inter[i-Nleaves,]
                }else{
                    tmpz1 <- Z2[i,]
                }

                if(j > Nleaves){
                    tmpz2 <- Z2inter[j-Nleaves,]
                }else{
                    tmpz2 <- Z2[j,]
                }

                Z2inter_dist[i,j] <- Poincare_dist(tmpz1, tmpz2)
            }
        }

... is quite slow. Do you have a parallelized version that you could share?

Answer 1 · 2021-09-05T15:17:36.000Z

As you pointed out, the computation is slow. Especially when the size of the phylogenetic tree is large, the current computation is not practical. I believe this is due to the fact that we are optimizing against all leaf nodes, etc., and parallelization will not improve things much. Rather, I believe that some kind of approximate calculation, such as comparing only some of the leaves, will greatly reduce the computational complexity.

However, I have not yet come up with a specific application that requires higher speed, so I have only done a very basic study and have not started to improve the software. If you have any ideas for applications, please let us know.

Answer 2 · 2021-09-05T16:37:02.000Z

The optimizations are independent, correct? If so, then some speedup can be gained via parallelization.

Answer 3 · 2021-09-06T04:30:03.000Z

Sorry, I misread that you mean parallel computing of optim function. Yes, you can optimize for each internal node independently. Unfortunately, I haven't gotten around to it and haven't implemented a parallelized version.

Answer 4 · 2021-09-06T06:59:42.000Z

For now, I'm just parallelizing for multiple M values.

It appears that hydraPlus uses BLAS, which can default to all threads on the machine, so parallel runs can increase the CPU load beyond the resources available. Setting BLAS threads via RhpcBLASctl::blas_set_num_threads seems to fix that issue.

Answer 5 · 2021-09-06T07:06:48.000Z

Thanks for the advice, I didn't know that.