CUPTI Counter Analysis empty
jeromeku opened this issue ยท 1 comments
jeromeku commented
๐ Describe the bug
Running the CUPTI demo results in an empty trace.
Running the example script with KINETO_LOG_LEVEL=1
:
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:225] CUDA versions. CUPTI: 18; Runtime: 12010; Driver: 12020
Log file: /tmp/libkineto_activities_3065.json
Trace start time: 2024-05-28 21:01:42 Trace duration: 500ms
Warmup duration: 0s
Max GPU buffer size: 128MB
Enabled activities: cuda_profiler_range
Cupti Profiler metrics : kineto__tensor_core_insts, dram__bytes_read.sum, dram__bytes_write.sum
Cupti Profiler measure per kernel : 0
Cupti Profiler max ranges : 10
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:834] Enabling GPU tracing
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:873] Tracing starting in 1s
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:878] Tracing will end in 2s
STAGE:2024-05-28 21:01:40 3065:3065 ActivityProfilerController.cpp:314] Completed Stage: Warm Up
STAGE:2024-05-28 21:01:41 3065:3065 ActivityProfilerController.cpp:320] Completed Stage: Collection
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:232] Processing 1 CPU buffers
WARNING:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:312] CPU trace is empty!
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:294] Record counts: Out-of-range = 0, Blocklisted runtime = 0, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1060] Traces Recorded:
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1063] PyTorch Profiler: 1 iterations
STAGE:2024-05-28 21:01:41 3065:3065 ActivityProfilerController.cpp:324] Completed Stage: Post Processing
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:90] Tracing to temporary file /tmp/tmpe3a6bubf.json
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:438] Chrome Trace written to /tmp/tmpe3a6bubf.json
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:486] Renamed the trace file to /tmp/tmpe3a6bubf.json
Trace file:
{
"schemaVersion": 1,
"deviceProperties": [
{
"id": 0, "name": "NVIDIA GeForce RTX 3090", "totalGlobalMem": 25438126080,
"computeMajor": 8, "computeMinor": 6,
"maxThreadsPerBlock": 1024, "maxThreadsPerMultiprocessor": 1536,
"regsPerBlock": 65536, "regsPerMultiprocessor": 65536, "warpSize": 32,
"sharedMemPerBlock": 49152, "sharedMemPerMultiprocessor": 102400,
"numSms": 82, "sharedMemPerBlockOptin": 101376
}
],
"record_shapes": 1,
"traceEvents": [
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
"args": {
"labels": "CPU"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
"args": {
"sort_index": 3065
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
"args": {
"labels": "GPU 0"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
"args": {
"sort_index": 16777216
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
"args": {
"labels": "GPU 1"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
"args": {
"sort_index": 16777217
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
"args": {
"labels": "GPU 2"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
"args": {
"sort_index": 16777218
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
"args": {
"labels": "GPU 3"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
"args": {
"sort_index": 16777219
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
"args": {
"labels": "GPU 4"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
"args": {
"sort_index": 16777220
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
"args": {
"labels": "GPU 5"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
"args": {
"sort_index": 16777221
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
"args": {
"labels": "GPU 6"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
"args": {
"sort_index": 16777222
}
},
{
"name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
"args": {
"name": "python"
}
},
{
"name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
"args": {
"labels": "GPU 7"
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
"args": {
"sort_index": 16777223
}
},
{
"ph": "X", "cat": "Trace", "ts": 1716930100945087, "dur": 158814,
"pid": "Spans", "tid": "PyTorch Profiler",
"name": "PyTorch Profiler (0)",
"args": {
"Op count": 0
}
},
{
"name": "process_sort_index", "ph": "M", "ts": 1716930100945087,
"pid": "Spans", "tid": 0,
"args": {
"sort_index": 536870912
}
},
{
"name": "Iteration Start: PyTorch Profiler", "ph": "i", "s": "g",
"pid": "Traces", "tid": "Trace PyTorch Profiler", "ts": 1716930100945087
},
{
"name": "Record Window End", "ph": "i", "s": "g",
"pid": "", "tid": "", "ts": 1716930101104050
}
],
"INFO": ["INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:834] Enabling GPU tracing","INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:873] Tracing starting in 1s","INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:878] Tracing will end in 2s","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:232] Processing 1 CPU buffers","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:294] Record counts: Out-of-range = 0, Blocklisted runtime = 0, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1060] Traces Recorded:","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1063] PyTorch Profiler: 1 iterations"],
"WARNING": ["WARNING:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:312] CPU trace is empty!"],
"traceName": "/tmp/tmpe3a6bubf.json"
}
Steps to reproduce
python cupti_profiler_demo.py
Expected behavior
Expected exported trace to have cuda_profiler_range
and counter data.
Environment
torch: 2.3.0
HTA: 0.2.0
python: 3.11.9
OS:
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
Additional Info
briancoutinho commented
@jeromeku Thanks for filing. Actually, the CUPTI range profiler is not working in open source due to a build limitation. I had a PR that enable the profiler and unfortunately it got reverted (broke pytorch nightly).
pytorch/pytorch#125685
Tracked in issue here and it also has other advise on potential issues to collect range profiler
pytorch/pytorch#125685
You can try using my PR and building from source (a bit painful i know) hope this helps.