Reduce the number of calls to ShuffleFile.getSize
jealous opened this issue · 0 comments
jealous commented
ShuffleFile.getSize
is a meta-data call of the file system. It could take several microseconds or even milliseconds in a shared file system. This would cause some overhead to the shuffle procedure.
When possible, we should use the recorded committed size or written bytes that are available in the memory instead of calling ShuffleFile.getSize
to retrieve this information.