Implement `opt_einsum` for generated eT H Z code
ngraymon opened this issue · 5 comments
Running hexahelicene without the optimized einsum
's is prohibitively slow.
These equations are not yet implemented for eT H Z ansatz OR full cc
See at
that the code assumes the user provides the optimized paths which is no longer the case in the latest t-amplitude branch. We should generate this in the same way as they are generated for
code_dt_equations.py
Additionally in the functions that are called the opt_path
parameter doesn't actually get used.
A good example is this test file where we can see the opt_einsum
path list
And in the write_optimized_vecc_paths_function_high_order_out.py file we can see how the oe.contract_expressions
are supposed to be used.
As a primer and to refresh myself on the simpler CC style I fixed dt_amplitudes
opt_einsum
path generator; see #75
Testing on first implementation of opt_einsum
calls with optimized paths:
Testing eT_zhz_eqs_H_2_P_4_T_1_exp_4_Z_3.py
with linear Cytosine with Z3 T1 terms in the i
50_cytosine_withopt_z3.txt
ntegrator
50_cytosine_noopt_z3.txt
We see a total time of 20.479s
for einsum calls with optimized paths versus 86.886
with no optimized paths.
This is for 50 integration steps where both approaches were able to propagate to 6.883405e-01 fs
The optimized took 0.641s
per rk_step
versus 2.974s
for un-optimized.
Estimating a somewhat consistent rate of 50 steps to 0.6883405fs it takes ~ 73 batches of 50 steps (3650) to get to 50fs.
So the optimized code should save 170s or so, almost 3 minutes
Should try hexahelicene next
Hexahelicene:
only 25 integration steps because this one takes so long
25_hexzhelicene_withopt_z3.txt
25_hexzhelicene_npopt_z3.txt
We see a total time of 344.653s
for einsum calls with optimized paths versus 4230.073
with no optimized paths.
Both approaches were able to propagate to 2.549166e-01fs
The optimized took 11.385s
per rk_step versus 155.004s
for un-optimized.
Estimating a somewhat consistent rate of 25 steps to 2.549166e-01fs
it takes ~ 196 batches of 25 steps (4903) to get to 50fs.
So the optimized code should take about 2243s
, approximately 37~38 minutes.
The un-optimized would take 303801s
, approximately 506 minutes or 8 1/2 hours.
This should save about 8 hours on the unoptimized code.
Changes pushed to t-amplitudes
and Hexahelicene spectra is being generated. Changes seem to be working, but there is more improvements that can be done so it might be best to wait before a pull-request.