_solve_adjoint_derivative_dense much slower than np.linalg.solve
zcyang opened this issue · 2 comments
zcyang commented
diffcp
is installed with openmp flags:
MARCH_NATIVE=1 OPENMP_FLAG="-fopenmp" pip install diffcp
It's at least 5 times slower than np.linalg.solve
.
Eigen solve should not be much slower than np.linalg.solve.
Report here in case the code performance can be improved.
sbarratt commented
We force Eigen to be single thread, so we can multi-thread diffcp. On the
other hand, I'm pretty sure that np.linalg.solve is multi-thread. So that
might explain the 5x difference (which is probably around the number of
cores you have).
…On Fri, Aug 7, 2020 at 7:51 PM Zichao Yang ***@***.***> wrote:
diffcp is installed with openmp flags:
MARCH_NATIVE=1 OPENMP_FLAG="-fopenmp" pip install diffcp
It's at least 5 times slower than np.linalg.solve.
Eigen solve should not be much slower than np.linalg.solve.
Report here in case the code performance can be improved.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#38>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB7LUGMM4YFTOJVFN2GE4NDR7S4URANCNFSM4PYKYYSA>
.
zcyang commented
It seems diffcp is using many cores in backward when batch_size = 1 ?