Structure enhancement of the shuffle reader
jealous opened this issue · 0 comments
jealous commented
The code of shuffle reader is not well structured and lack the capability of inserting tests for those classes. The separation of responsibility is not clear. Re-structure the code to meet the following requirements:
- Insert the error handling and dump logic in the correct place and test the dump logic in the unit test.
- Remove the state in
SplashShuffleFetchIterator
and make it acase class
. - Extract the
SplashShuffleFetcher
class which is responsible for:- Track the resource usage and resource cleanup of the current partition.
- Error handling and trace information of the current partition.
- Data transformation related the current partition.
- Extract the iterator wrapper in
SplashShuffleReader
to separate functionsgetAggregatedIterator
adds the combiner logic into the iteratorgetSortedIterator
adds the sorter logic into the iterator.
- Move all sorter related metrics tracking logic into the sorter itself.