hkchengrex/shared-memory-tensor-dataset

This repository provides an example of reading from a single shared memory tensor from multiple processes (e.g., with DDP).

PythonApache-2.0

Shared Memory Tensor Dataset with torchrun

Overview

This repository provides an example of reading from a single shared memory tensor from multiple processes (e.g., with DDP).
Useful for loading a large tensor (e.g., the entire dataset) to the CPU to speed up I/O without incurring Nx memory usage where N is the number of GPUs/processes
We use the standard torch.utils.data.Dataloader which might make it easier for you to use this in your own code
Works with torchrun
Does not depend on detectron2

Limitation

We did not test this script in the multi-node setting. It probably would not work.

Usage

(N is the number of GPUs/processes)

Run torchrun --standalone --nproc_per_node=N main-multigpu-naive.py
Look at the memory usage.
Run torchrun --standalone --nproc_per_node=N main-multigpu-shared.py
Look at the memory usage again.

Dependencies

Python >= 3.7
Linux
PyTorch >= 1.10
pip install psutil tabulate tensordict

Acknowledgement

Inspired by and modified from https://github.com/ppwwyyxx/RAM-multiprocess-dataloader

See also: