facebookresearch/HolisticTraceAnalysis

CUPTI Counter Analysis empty

jeromeku opened this issue ยท 1 comments

๐Ÿ› Describe the bug

Running the CUPTI demo results in an empty trace.

Running the example script with KINETO_LOG_LEVEL=1:

INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:225] CUDA versions. CUPTI: 18; Runtime: 12010; Driver: 12020
  Log file: /tmp/libkineto_activities_3065.json
  Trace start time: 2024-05-28 21:01:42  Trace duration: 500ms
  Warmup duration: 0s
  Max GPU buffer size: 128MB
  Enabled activities: cuda_profiler_range
Cupti Profiler metrics : kineto__tensor_core_insts, dram__bytes_read.sum, dram__bytes_write.sum
Cupti Profiler measure per kernel : 0
Cupti Profiler max ranges : 10
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:834] Enabling GPU tracing
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:873] Tracing starting in 1s
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:878] Tracing will end in 2s
STAGE:2024-05-28 21:01:40 3065:3065 ActivityProfilerController.cpp:314] Completed Stage: Warm Up
STAGE:2024-05-28 21:01:41 3065:3065 ActivityProfilerController.cpp:320] Completed Stage: Collection
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:232] Processing 1 CPU buffers
WARNING:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:312] CPU trace is empty!
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:294] Record counts: Out-of-range = 0, Blocklisted runtime = 0, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1060] Traces Recorded:
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1063] PyTorch Profiler: 1 iterations
STAGE:2024-05-28 21:01:41 3065:3065 ActivityProfilerController.cpp:324] Completed Stage: Post Processing
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:90] Tracing to temporary file /tmp/tmpe3a6bubf.json
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:438] Chrome Trace written to /tmp/tmpe3a6bubf.json
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:486] Renamed the trace file to /tmp/tmpe3a6bubf.json

Trace file:


{
  "schemaVersion": 1,
  "deviceProperties": [
    {
      "id": 0, "name": "NVIDIA GeForce RTX 3090", "totalGlobalMem": 25438126080,
      "computeMajor": 8, "computeMinor": 6,
      "maxThreadsPerBlock": 1024, "maxThreadsPerMultiprocessor": 1536,
      "regsPerBlock": 65536, "regsPerMultiprocessor": 65536, "warpSize": 32,
      "sharedMemPerBlock": 49152, "sharedMemPerMultiprocessor": 102400,
      "numSms": 82, "sharedMemPerBlockOptin": 101376
    }
  ],
  "record_shapes": 1,
  "traceEvents": [
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
    "args": {
      "labels": "CPU"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
    "args": {
      "sort_index": 3065
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
    "args": {
      "labels": "GPU 0"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
    "args": {
      "sort_index": 16777216
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
    "args": {
      "labels": "GPU 1"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
    "args": {
      "sort_index": 16777217
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
    "args": {
      "labels": "GPU 2"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
    "args": {
      "sort_index": 16777218
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
    "args": {
      "labels": "GPU 3"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
    "args": {
      "sort_index": 16777219
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
    "args": {
      "labels": "GPU 4"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
    "args": {
      "sort_index": 16777220
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
    "args": {
      "labels": "GPU 5"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
    "args": {
      "sort_index": 16777221
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
    "args": {
      "labels": "GPU 6"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
    "args": {
      "sort_index": 16777222
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
    "args": {
      "labels": "GPU 7"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
    "args": {
      "sort_index": 16777223
    }
  },
  {
    "ph": "X", "cat": "Trace", "ts": 1716930100945087, "dur": 158814,
    "pid": "Spans", "tid": "PyTorch Profiler",
    "name": "PyTorch Profiler (0)",
    "args": {
      "Op count": 0
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945087,
    "pid": "Spans", "tid": 0,
    "args": {
      "sort_index": 536870912
    }
  },
  {
    "name": "Iteration Start: PyTorch Profiler", "ph": "i", "s": "g",
    "pid": "Traces", "tid": "Trace PyTorch Profiler", "ts": 1716930100945087
  },
  {
    "name": "Record Window End", "ph": "i", "s": "g",
    "pid": "", "tid": "", "ts": 1716930101104050
  }
  ],
  "INFO": ["INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:834] Enabling GPU tracing","INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:873] Tracing starting in 1s","INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:878] Tracing will end in 2s","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:232] Processing 1 CPU buffers","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:294] Record counts: Out-of-range = 0, Blocklisted runtime = 0, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1060] Traces Recorded:","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1063] PyTorch Profiler: 1 iterations"],
  "WARNING": ["WARNING:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:312] CPU trace is empty!"],
  "traceName": "/tmp/tmpe3a6bubf.json"
}

Steps to reproduce

python cupti_profiler_demo.py

Expected behavior

Expected exported trace to have cuda_profiler_range and counter data.

Environment

torch: 2.3.0
HTA: 0.2.0
python: 3.11.9
OS:

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

Additional Info

@briancoutinho

@jeromeku Thanks for filing. Actually, the CUPTI range profiler is not working in open source due to a build limitation. I had a PR that enable the profiler and unfortunately it got reverted (broke pytorch nightly).
pytorch/pytorch#125685

Tracked in issue here and it also has other advise on potential issues to collect range profiler
pytorch/pytorch#125685

You can try using my PR and building from source (a bit painful i know) hope this helps.