BlueBrain/eFEL

Possible speed up?

Closed this issue · 9 comments

By playing I noticed that efel's slowest part is not the feature computation but some initialisation.

I did the following experiment:
In examples/basic/basic_example1.py

 47     from time import time                                                                           
 48     for _ in range(1000):                                                                           
 49         t = time()                                                                                  
 50         traces_results = efel.getFeatureValues(traces,                                              
 51                                             ['AP_amplitude', 'voltage_base'])                       
 52         print('getFeatureValues time= ', time()-t)  

and in efel.api.py

349 def _initialise():                                                                                  
350     """Set cppcore initial values"""                                                                
351     from time import time                                                                           
352     t = time()                                                                                      
353     cppcore.Initialize(_settings.dependencyfile_path, "log")                                        
354     print("Initialise time=", time()-t)                                                             
355                                             

which results in:

Initialise time= 0.02448892593383789
getFeatureValues time=  0.02686905860900879
Initialise time= 0.024504899978637695
getFeatureValues time=  0.026882410049438477
Initialise time= 0.0245211124420166
getFeatureValues time=  0.026913166046142578

so the call to cppcore.Initialize takes around 90% of CPU time.

Maybe there is a way to optimise something?

I would say I'm not crazy surprised by this. Initialize might access more memory and it's also reading files etc. But it might be worthwhile looking at it with a profiler.
But why do you say it's 90%, it rather looks like the same amount of time than the feature extraction?

Because the getFeature timing is outside the other one, so it is the time of initialise + rest of the computation. If you divide these 2 numbers, you get 0.9

It actually may be worth repeating this experiment for a more time consuming feature. I have a feeling that this feature happens to be very fast.

okok, makes sense. Tyeah, my guess would be the reading/parsing dependency file is the biggest issue. In practice we don't do that anymore, but I guess the original idea was that one would give many traces and many features, so that it would matter less.
We could try to avoid reading/parsing the file, but it won't be trivial. Of course we would also have to be sure first that is really the cause. Not completely resetting efel after each feature extraction is also dangerous wrt polluting consecutive feature extractions.

The final solution to all these things would be to have pure python code, but we don't really have the bandwidth for that atm.

yes, I see it is not easy, but wanted to report anyway since I saw this twice. The other time was when I was extracting features for many recording (one in each compartment for francesca's project) and it was so slow that I recoded them in python ( we looked at only a few easy features). If you do 10k efel calls, this make efel very slow overall. But we rarely do that many calls in one shot.

It actually may be worth repeating this experiment for a more time consuming feature. I have a feeling that this feature happens to be very fast.

wouldn't it be 0.245s + time to compute the feature? fraction wise a bit lower, but if you do 10k evaluations, you wait 2450s to re-initialise the same thing

Yeah. I assume you didn't give many traces in one API call, right?

No, I used it via bluepyemodel/bluyepyop, where this is not implemented, but maybe I should have used the parallel stuff of efel. I didn't realise it had this feature at the time I think.

DrTaDa commented

If a part of the issue is during extraction step in BluePyEModel, then I think the issue comes from the implement I did for BluePyEfe where the calls to eFEL are per recording. it is convenient because then the efel_settings can be different for each recordings/features but it is slower and could be optimized some more.