geoffxy/habitat

CUPTI_ERROR_INSUFFICIENT_PRIVILEGES in container

yzs981130 opened this issue · 1 comments

The default configuration on my OS and current directions in README may lead to a CUPTI_ERROR_INSUFFICIENT_PRIVILEGES when using CUPTI inside the container.

The example log is attached below:

/home/ubuntu/home/habitat/cpp/src/cuda/cupti_tracer.cpp:120: error: function cuptiActivityRegisterCallbacks(cuptiBufferRequested, cuptiBufferCompleted) failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES.
Traceback (most recent call last):
  File "run_experiment.py", line 246, in <module>
    main()
  File "run_experiment.py", line 238, in main
    run_dcgan_experiments(context)
  File "run_experiment.py", line 155, in run_dcgan_experiments
    context,
  File "run_experiment.py", line 85, in run_experiment_config
    threshold = compute_threshold(runnable, context)
  File "run_experiment.py", line 66, in compute_threshold
    runnable()
  File "run_experiment.py", line 150, in runnable
    iteration(*inputs)
  File "/home/ubuntu/home/habitat/experiments/dcgan/entry_point.py", line 41, in iteration
    netD.zero_grad()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1098, in zero_grad
    p.grad.detach_()
  File "/home/ubuntu/home/habitat/analyzer/habitat/tracking/operation.py", line 62, in hook
    kwargs,
  File "/home/ubuntu/home/habitat/analyzer/habitat/profiling/operation.py", line 45, in measure_operation
    record_kernels,
  File "/home/ubuntu/home/habitat/analyzer/habitat/profiling/operation.py", line 164, in _to_run_time_measurement
    if record_kernels else []
  File "/home/ubuntu/home/habitat/analyzer/habitat/profiling/kernel.py", line 34, in measure_kernels
    self._measure_kernels_raw(runnable, fname)
  File "/home/ubuntu/home/habitat/analyzer/habitat/profiling/kernel.py", line 48, in _measure_kernels_raw
    time_kernels = hc.profile(runnable)
RuntimeError: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

My solution:
Adding options nvidia "NVreg_RestrictProfilingToAdminUsers=0" to /etc/modprobe.d/nvidia-kernel-common.conf
and reboot.

Ref:

Thanks! I'm glad you got it working. I've updated the README to add a link to NVIDIA's documentation and this issue. cd3735b