`nvml-wrapper` 0.7.0 enables loading NVML at runtime

Question

`nvml-wrapper` 0.7.0 enables loading NVML at runtime

Cldfire opened this issue 4 years ago · 3 comments

Hi! Author of nvml-wrapper here.

I've just published nvml-wrapper 0.7.0, the headlining feature of which is the ability to load the NVML library at runtime (as opposed to linking to it at compiletime). Updating to this version of nvml-wrapper will allow you to get rid of the nvidia feature in zenith and give you the ability to enable / disable GPU-related functionality at runtime based on the hardware in the system!

I cloned the repo locally and tried zenith out with the new nvml-wrapper release; everything worked with zero code changes, which was great to see. One thing I would highly recommend you do, though, is change your strategy for constructing the NVML struct. Currently, the code does the following every time it polls the GPU for information:

let nvml = NVML::init();
let n = match nvml {
    Ok(n) => n,
    Err(e) => {
        error!("Couldn't init NVML: {:?}", e);
        return;
    }
};

This isn't ideal today because it performs all of the work to initialize NVML every time data is collected, which (at least on my machine) skews GPU usage stats and increases power usage. With nvml-wrapper 0.7.0 and beyond this call is also responsible for finding and loading all NVML function symbols, so it'll become even more expensive.

Instead, I'd recommend initializing NVML once at startup, and using the resulting instance for the rest of the program's lifetime. This could be accomplished quite simply as follows using once_cell (error handling up to you):

static NVML_INSTANCE: Lazy<NVML> = Lazy::new(|| NVML::init().unwrap());

Or, depending on how you want to architect it, you could initialize NVML elsewhere and pass it into the function calls where you need access to it.

If you run into any issues updating to 0.7.0 please let me know and I'd be happy to help! :)

Answer 1 · 2020-12-07T16:38:17.000Z

This sounds fantastic! Thank you!

When I was first working on this I was hoping to accomplish exactly this, awesome.

Answer 2 · 2020-12-14T22:52:28.000Z

Re-opening as though @alexmaco changes address many of the issues here, I did want to be able to remove --features nvidia and have supported detected at runtime.

Answer 3 · 2020-12-17T16:05:50.000Z

@bvaisvil To safely have nvidia support always-on, the error handling should also be adjusted a bit. I still have an nvidia gpu for which nvml prints errors on every loop. That detection and counting of gpus should either be done only at initialization, or only logged to files, instead of the stdout where it interferes with the display.