pytorch/data

S3FileLoaderIterDataPipe buffer_size

commonism opened this issue ยท 0 comments

๐Ÿ“š The doc issue

The default for S3 buffer size is 128 MB - or 128 * (1024**2)

static const size_t S3DefaultBufferSize = 128 * 1024 * 1024; // 128 MB

The example for S3FileLoaderIterDataPipe uses a buffer_size of 256.

dp_s3_files = sharded_s3_urls.load_files_by_s3(buffer_size=256)

Using a 256 bytes buffer degrades performance and allows the assumption buffer_size is provided in mbytes, as the example would double the 128 mbyte default.

Suggest a potential alternative/fix

document buffer_size to be in bytes and have the example use 256 * (1024**2) as value.