_solve_adjoint_derivative_dense much slower than np.linalg.solve

Question

_solve_adjoint_derivative_dense much slower than np.linalg.solve

zcyang opened this issue 4 years ago · 2 comments

zcyang commented 4 years ago

diffcp is installed with openmp flags:

MARCH_NATIVE=1 OPENMP_FLAG="-fopenmp" pip install diffcp

It's at least 5 times slower than np.linalg.solve.
Eigen solve should not be much slower than np.linalg.solve.

Report here in case the code performance can be improved.

Answer 1 · 2020-08-28T00:22:54.000Z

We force Eigen to be single thread, so we can multi-thread diffcp. On the other hand, I'm pretty sure that np.linalg.solve is multi-thread. So that might explain the 5x difference (which is probably around the number of cores you have).

…

On Fri, Aug 7, 2020 at 7:51 PM Zichao Yang ***@***.***> wrote: diffcp is installed with openmp flags: MARCH_NATIVE=1 OPENMP_FLAG="-fopenmp" pip install diffcp It's at least 5 times slower than np.linalg.solve. Eigen solve should not be much slower than np.linalg.solve. Report here in case the code performance can be improved. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#38>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7LUGMM4YFTOJVFN2GE4NDR7S4URANCNFSM4PYKYYSA> .

Answer 2 · 2020-09-07T04:24:22.000Z

It seems diffcp is using many cores in backward when batch_size = 1 ?