Severe overhead in the import of DeerLab

Question

Severe overhead in the import of DeerLab

luisfabib opened this issue 2 years ago · 1 comments

During the call to import deerlab as dl, Python loads the different functions of DeerLab into memory (fast), but also constructs and initializes all built-in models in the dd_models and bg_models files (slow). Their initialization is costly, making the import of DeerLab significantly slower than other packages.

The overhead, while not extreme, can be substantial for simple programs. For example, the following script for quickly analyzing a routine 4-pulse DEER experiment:

import numpy as np
import matplotlib.pyplot as plt
import deerlab as dl

# File location
path = r'D:\lufa\projects\DeerLab\DeerLab\examples\data\\'
file = 'example_4pdeer_1.DTA'
tau1 = 0.3      
tau2 = 4.0      
deadtime = 0.1 
t,Vexp = dl.deerload(path + file)

Vexp = dl.correctphase(Vexp) 
Vexp = Vexp/np.max(Vexp)     
t = t + deadtime             

r = np.arange(2.5,5,0.01) # nm
Vmodel = dl.dipolarmodel(t,r, experiment = dl.ex_4pdeer(tau1,tau2, pathways=[1]))

results = dl.fit(Vmodel,Vexp)
print(results)

when profiled, reveals that above a quarter of the total runtime (about 1.35s out of the total 5.54s) is spent in the import of DeerLab (specifically, in the instancing of the objects defined in dd_models and bg_models:

While this import time is minor for long analyses requiring several orders of magnitude more, it can be cumbersome for development and quick programs.

Answer 1 · 2023-08-08T09:33:58.000Z

The vast majority of DeerLabs import overhead comes from the importing of Scipy, Numpy and Matplotlib. This is inevtiable.

About 15% of the import time occurs due to how the docstrings are created. Since the docs strings for dd_models and bg_models are created by functions these need to be run everytime. I would argue that this layout of docstrings is unnecessary and makes it harder to read the code. The import overhead however isn't a massive difference and will not be felt by most users.