Lightning-AI/pytorch-lightning

PyTorchProfiler: not showing CPU memory used even with `profile_memory=True`

Jack12xl opened this issue · 0 comments

Bug description

Trying to use PyTorchProfiler (https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.profilers.PyTorchProfiler.html) to track some OOM(cpu memory) issues.
I go with

profiler = PyTorchProfiler(
        dirpath=log_dir,  # Directory to save logs
        filename="memory_profile",  # Name of the file to save results
        sort_by_key="self_cpu_memory_usage",  # Sort by CPU memory usage
        export_to_chrome=True,  # Export as JSON for Chrome
        row_limit=16,
        activities=[torch.profiler.ProfilerActivity.CPU],
        profile_memory=True,  # Record CPU memory usage
        with_stack=True,
        record_shapes=True,
    )

trainer=pl.Trainer(..., profiler=profiler, ...)
trainer.fit()

I expected to results similar to Pytorch Native profiler(https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-profiler-to-analyze-memory-consumption). But it's still outputting cpu/gpu time like here:
image

I don't know if this is a bug(I thought PyTorchProfiler was a wrapper around native Pytorch Profiler, so it should have similar behavior when I set profile_memory=True).

Thanks! Please correct me if I am wrong!

What version are you seeing the problem on?

v2.4

How to reproduce the bug

profiler = PyTorchProfiler(
dirpath=log_dir, # Directory to save logs
filename="memory_profile", # Name of the file to save results
sort_by_key="self_cpu_memory_usage", # Sort by CPU memory usage
export_to_chrome=True, # Export as JSON for Chrome
row_limit=16,
activities=[torch.profiler.ProfilerActivity.CPU],
profile_memory=True, # Record CPU memory usage
with_stack=True,
record_shapes=True,
)

trainer=pl.Trainer(..., profiler=profiler, ...)
trainer.fit()

Error messages and logs

# Error messages and logs here please

Environment

Current environment
* CUDA:
	- GPU:
		- NVIDIA 30xx GPU
	- available:         True
	- version:           12.1
* Lightning:
	- lightning:         2.4.0
	- lightning-utilities: 0.11.7
	- pytorch-lightning: 2.4.0
	- torch:             2.3.1
	- torchaudio:        2.3.1
	- torchdata:         0.8.0
	- torchmetrics:      1.4.1
	- torchvision:       0.18.1

Python: 3.12.4

More info

It's not directly related to this issue. But is there some way I could have the export_memory_timeline(https://pytorch.org/docs/main/profiler.html#torch.profiler._KinetoProfile.export_memory_timeline) behavior with lightning PytorchProfiler?

Thanks!