`nvml-wrapper` 0.7.0 enables loading NVML at runtime
Cldfire opened this issue · 3 comments
Hi! Author of nvml-wrapper
here.
I've just published nvml-wrapper
0.7.0, the headlining feature of which is the ability to load the NVML library at runtime (as opposed to linking to it at compiletime). Updating to this version of nvml-wrapper
will allow you to get rid of the nvidia
feature in zenith
and give you the ability to enable / disable GPU-related functionality at runtime based on the hardware in the system!
I cloned the repo locally and tried zenith
out with the new nvml-wrapper
release; everything worked with zero code changes, which was great to see. One thing I would highly recommend you do, though, is change your strategy for constructing the NVML
struct. Currently, the code does the following every time it polls the GPU for information:
let nvml = NVML::init();
let n = match nvml {
Ok(n) => n,
Err(e) => {
error!("Couldn't init NVML: {:?}", e);
return;
}
};
This isn't ideal today because it performs all of the work to initialize NVML every time data is collected, which (at least on my machine) skews GPU usage stats and increases power usage. With nvml-wrapper
0.7.0 and beyond this call is also responsible for finding and loading all NVML function symbols, so it'll become even more expensive.
Instead, I'd recommend initializing NVML once at startup, and using the resulting instance for the rest of the program's lifetime. This could be accomplished quite simply as follows using once_cell
(error handling up to you):
static NVML_INSTANCE: Lazy<NVML> = Lazy::new(|| NVML::init().unwrap());
Or, depending on how you want to architect it, you could initialize NVML elsewhere and pass it into the function calls where you need access to it.
If you run into any issues updating to 0.7.0
please let me know and I'd be happy to help! :)
This sounds fantastic! Thank you!
When I was first working on this I was hoping to accomplish exactly this, awesome.
Re-opening as though @alexmaco changes address many of the issues here, I did want to be able to remove --features nvidia
and have supported detected at runtime.
@bvaisvil To safely have nvidia support always-on, the error handling should also be adjusted a bit. I still have an nvidia gpu for which nvml prints errors on every loop. That detection and counting of gpus should either be done only at initialization, or only logged to files, instead of the stdout where it interferes with the display.