project-codeflare/codeflare

Lineage And

Opened this issue · 0 comments

Overview

AND node semantics computes a full cross product. In grid search cv, an AND node like feature union will require features to be joined in a given input object. For example when performing two fold cross validation on the following pipeline: (PCA (n_components = 5, 10) || Nystrom || Select k-best) && Feature Union. On two-fld CV, we get four objects from PCA node (2x2) and two objects each from Nystrom and Select k-best. A regular AND node will compute 4x2x2 cross product. A lineage and will compute 4 cross products: (pca_5, Nystrom, Select k_best) on the two input objects and (pca_10, Nystrom, select k_best) on the the same two input objects.

Lineage And: Solution select items in the AND node cross product that share the same input object lineage

Acceptance Criteria

  • implement Lineage And
  • Test Lineage And on a feature union pipeline

Questions

Assumptions

Reference