megvii-research/megfile

performance reduction of tarfile

Closed this issue · 1 comments

Using tarfile and smart_open to read files on OSS, there are performance issues when buffered is False

When reading a tar file, tarfile will call read multiple times. To ensure compatibility with lower versions of numpy for reading ndarray files, users should manually add the parameter buffered=True when working with tarfile.