awslabs/s3-connector-for-pytorch

High memory usage with S3MapDataset

IsaevIlya opened this issue · 1 comments

s3torchconnector version

s3torchconnector-1.1.0

s3torchconnectorclient version

s3torchconnectorclient-1.1.0

AWS Region

eu-west-2

Describe the running environment

Run e2e test from project test_multiproccess_dataloading.py:test_s3mapdataset_multiprocess against dataset that consists of 2000 1.3MB files. Number of workers is set to 8, the number of epochs is set to 4. That test is reproducing the training workflow with use of S3MapDataset as dataset.

What happened?

During each epochs the increase in memory consumption was aligned with the total size of dataset, increasing on 2.5GB. With the end of epoch the memory consumption will return to previous level, then starting to increase once again.
The expectation is that increase in memory consumption will be bellow the total size of dataset.

image

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

I conducted a memory benchmark test using version 1.1.1. The memory usage pattern has improved, but there are still differences in memory consumption depending on the method used to create child processes.
image
To confirm that these differences in creating child processes would not cause long-term problems when training models, I ran memory benchmarks for all three methods over 500 epochs. The results show that while there are discrepancies in memory usage between methods, the peak memory consumption remains stable over time for each one. This suggests there are no memory leaks associated with the different child process creation approaches.

image