cesium-ml/cesium

Optimize feature computation

bnaul opened this issue · 0 comments

bnaul commented

Some thoughts on why things are slow at the moment:

  • At the moment our entire pipeline assumes that all time series are unevenly-spaced; as a result, internal computations are always performed on every time series separately. If we had some check for the evenly-spaced case, we could use different (faster) numpy array routines.
    • cf. np.max(X, axis=0) and [np.max(x_i) for x_i in X]
  • Our communication overhead through dask isn't horrible as far as I can tell, but it's a (relatively) bigger factor for 1) many time series, 2) shorter time series, or 3) simpler features.
  • How many features could be sped up in this way? My intuition is that a vectorized approach exists for most the general features, some of the cadence features, and none of the Lomb-Scargle features.
  • Somewhat related to #227 in that we would want to handle 3d arrays in a special way.