PyTorchProfiler: not showing CPU memory used even with `profile_memory=True`
Jack12xl opened this issue · 0 comments
Bug description
Trying to use PyTorchProfiler
(https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.profilers.PyTorchProfiler.html) to track some OOM(cpu memory) issues.
I go with
profiler = PyTorchProfiler(
dirpath=log_dir, # Directory to save logs
filename="memory_profile", # Name of the file to save results
sort_by_key="self_cpu_memory_usage", # Sort by CPU memory usage
export_to_chrome=True, # Export as JSON for Chrome
row_limit=16,
activities=[torch.profiler.ProfilerActivity.CPU],
profile_memory=True, # Record CPU memory usage
with_stack=True,
record_shapes=True,
)
trainer=pl.Trainer(..., profiler=profiler, ...)
trainer.fit()
I expected to results similar to Pytorch Native profiler(https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-profiler-to-analyze-memory-consumption). But it's still outputting cpu/gpu time like here:
I don't know if this is a bug(I thought PyTorchProfiler was a wrapper around native Pytorch Profiler, so it should have similar behavior when I set profile_memory=True
).
Thanks! Please correct me if I am wrong!
What version are you seeing the problem on?
v2.4
How to reproduce the bug
profiler = PyTorchProfiler(
dirpath=log_dir, # Directory to save logs
filename="memory_profile", # Name of the file to save results
sort_by_key="self_cpu_memory_usage", # Sort by CPU memory usage
export_to_chrome=True, # Export as JSON for Chrome
row_limit=16,
activities=[torch.profiler.ProfilerActivity.CPU],
profile_memory=True, # Record CPU memory usage
with_stack=True,
record_shapes=True,
)
trainer=pl.Trainer(..., profiler=profiler, ...)
trainer.fit()
Error messages and logs
# Error messages and logs here please
Environment
Current environment
* CUDA:
- GPU:
- NVIDIA 30xx GPU
- available: True
- version: 12.1
* Lightning:
- lightning: 2.4.0
- lightning-utilities: 0.11.7
- pytorch-lightning: 2.4.0
- torch: 2.3.1
- torchaudio: 2.3.1
- torchdata: 0.8.0
- torchmetrics: 1.4.1
- torchvision: 0.18.1
Python: 3.12.4
More info
It's not directly related to this issue. But is there some way I could have the export_memory_timeline
(https://pytorch.org/docs/main/profiler.html#torch.profiler._KinetoProfile.export_memory_timeline) behavior with lightning PytorchProfiler
?
Thanks!