project-codeflare/codeflare

Data splitter

raghukiran1224 opened this issue · 1 comments

Overview

As a CFP user, I would like to split a dataset (e.g., np array, pandas dataframe) into smaller objects that can then be fed into other nodes/pipeline. This is especially useful when we have compute intensive tasks and would like to parallelize it easily.

Acceptance Criteria

  • Design for splitter, should be simple and intuitive
  • Implementation as an extension to the Node construct
  • Tests

Questions

  • What type of semantics does the splitter node define?

Assumptions

Reference

The basic utility has been added, putting it as an actual node needs more work.