Optimize feature computation

Question

bnaul opened this issue 8 years ago · 0 comments

Some thoughts on why things are slow at the moment:

At the moment our entire pipeline assumes that all time series are unevenly-spaced; as a result, internal computations are always performed on every time series separately. If we had some check for the evenly-spaced case, we could use different (faster) numpy array routines.
- cf. np.max(X, axis=0) and [np.max(x_i) for x_i in X]
Our communication overhead through dask isn't horrible as far as I can tell, but it's a (relatively) bigger factor for 1) many time series, 2) shorter time series, or 3) simpler features.
How many features could be sped up in this way? My intuition is that a vectorized approach exists for most the general features, some of the cadence features, and none of the Lomb-Scargle features.
Somewhat related to #227 in that we would want to handle 3d arrays in a special way.