MemVerge/splash

Structure enhancement of the shuffle reader

jealous opened this issue · 0 comments

The code of shuffle reader is not well structured and lack the capability of inserting tests for those classes. The separation of responsibility is not clear. Re-structure the code to meet the following requirements:

  • Insert the error handling and dump logic in the correct place and test the dump logic in the unit test.
  • Remove the state in SplashShuffleFetchIterator and make it a case class.
  • Extract the SplashShuffleFetcher class which is responsible for:
    • Track the resource usage and resource cleanup of the current partition.
    • Error handling and trace information of the current partition.
    • Data transformation related the current partition.
  • Extract the iterator wrapper in SplashShuffleReader to separate functions
    • getAggregatedIterator adds the combiner logic into the iterator
    • getSortedIterator adds the sorter logic into the iterator.
  • Move all sorter related metrics tracking logic into the sorter itself.