ngraymon/termfactory

Implement `opt_einsum` for generated eT H Z code

ngraymon opened this issue · 5 comments

Running hexahelicene without the optimized einsum's is prohibitively slow.

These equations are not yet implemented for eT H Z ansatz OR full cc

See at

optimized_connected_paths, optimized_linked_paths, optimized_unlinked_paths = opt_paths

that the code assumes the user provides the optimized paths which is no longer the case in the latest t-amplitude branch. We should generate this in the same way as they are generated for code_dt_equations.py

Additionally in the functions that are called the opt_path parameter doesn't actually get used.

def add_m0_n0_fully_connected_terms_optimized(R, ansatz, truncation, h_args, t_args, opt_paths):

A good example is this test file where we can see the opt_einsum path list

And in the write_optimized_vecc_paths_function_high_order_out.py file we can see how the oe.contract_expressions are supposed to be used.

As a primer and to refresh myself on the simpler CC style I fixed dt_amplitudes opt_einsum path generator; see #75

Okay so now I should be better prepared to handle this issue after fixing #76

Testing on first implementation of opt_einsum calls with optimized paths:

Testing eT_zhz_eqs_H_2_P_4_T_1_exp_4_Z_3.py with linear Cytosine with Z3 T1 terms in the i
50_cytosine_withopt_z3.txt
ntegrator
50_cytosine_noopt_z3.txt

We see a total time of 20.479s for einsum calls with optimized paths versus 86.886 with no optimized paths.
This is for 50 integration steps where both approaches were able to propagate to 6.883405e-01 fs

The optimized took 0.641s per rk_step versus 2.974s for un-optimized.
Estimating a somewhat consistent rate of 50 steps to 0.6883405fs it takes ~ 73 batches of 50 steps (3650) to get to 50fs.
So the optimized code should save 170s or so, almost 3 minutes
Should try hexahelicene next

Hexahelicene:
only 25 integration steps because this one takes so long
25_hexzhelicene_withopt_z3.txt
25_hexzhelicene_npopt_z3.txt

We see a total time of 344.653s for einsum calls with optimized paths versus 4230.073 with no optimized paths.
Both approaches were able to propagate to 2.549166e-01fs

The optimized took 11.385s per rk_step versus 155.004s for un-optimized.
Estimating a somewhat consistent rate of 25 steps to 2.549166e-01fs it takes ~ 196 batches of 25 steps (4903) to get to 50fs.

So the optimized code should take about 2243s, approximately 37~38 minutes.
The un-optimized would take 303801s, approximately 506 minutes or 8 1/2 hours.
This should save about 8 hours on the unoptimized code.

Changes pushed to t-amplitudes and Hexahelicene spectra is being generated. Changes seem to be working, but there is more improvements that can be done so it might be best to wait before a pull-request.