mila-iqia/milabench

Use NVML instead of parsing nvidia-smi output

gravitino opened this issue · 2 comments

The latest version of milabench breaks because of dysfunctional parsing in https://github.com/mila-iqia/milabench/blob/master/milabench/gpu.py . Use pynvml https://pypi.org/project/pynvml/ instead.

root@fdea0e909c74:/workspace/milabench# milabench run --base standard-cuda/ config/standard-cuda.yaml 
There was a problem with nvidia-smi:
================================================================================
b'ERROR: no input message specified\n'
================================================================================
There was a problem with nvidia-smi:
================================================================================
b'ERROR: no input message specified\n'
================================================================================
[BEGIN] Reports directory: /workspace/milabench/standard-cuda/runs/dijuzaza.2022-12-16_21:20:30.417110
There was a problem with nvidia-smi:
================================================================================
b'ERROR: no input message specified\n'
================================================================================
Traceback (most recent call last):
  File "/opt/anaconda/bin/milabench", line 8, in <module>
    sys.exit(main())
  File "/workspace/milabench/milabench/cli.py", line 23, in main
    run_cli(Main)
  File "/opt/anaconda/lib/python3.9/site-packages/coleo/cli.py", line 628, in run_cli
    return call(opts=opts, args=args)
  File "/opt/anaconda/lib/python3.9/site-packages/coleo/cli.py", line 587, in thunk
    result = fn(*args)
  File "/workspace/milabench/milabench/cli.py", line 192, in run
    mp.do_run(
  File "/workspace/milabench/milabench/multi.py", line 116, in do_run
    for run in method(cfg, **plan):
  File "/workspace/milabench/milabench/multi.py", line 34, in per_gpu
    gpus = get_gpu_info().values()
AttributeError: 'NoneType' object has no attribute 'values'

Yeah, it appears xml2json failed for some reason. Using pynvml instead is a good idea, I'll take care of it. Thanks!

Done in #40