/cudamon

GPU monitor for CUDA devices

Primary LanguageC

cudamon

cudamon is a GPU monitor for CUDA devices. It provides a simple server process (written in C), and a web-based front-end to keep track of that status of the devices. It is meant to provide a really simple way of keeping an eye on GPU devices while developing CUDA applications.

I personally find it more immediately clear to look at than nvidia-smi -l, but hey, that's me.

It works well enough, but the client is pretty bare. It displays some time series graphs and that's about it.

It is also read only. Use nvidia-smi if you want to modify the GPU.

NOTE: If you manage to write a CUDA application that crashes particularly spectacularly, you may need to stop all programs that are currently using CUDA in any way before being able to run another process, which would include cudamond. This is at least true using multiple GTX680s on recent drivers. I don't know how buggy other drivers are. Just a heads up.

Building

cudamon is currently very simple, not feature complete, and has no way of being customized outside of editing the source. This will hopefully change in the future, but for now, the build process is as simple as running make. There is currently no install support, mainly because it is not featureful enough to warrant an installation yet.

If you do decide to put cudamond somewhere, do keep in mind that the cudamond binary and the www/ directory need to be in the same directory. Or you could modify the source in src/server.c to change this location.

This will be configurable at some point in the future.

Device Support

This program relies on the NVML library to gather its information from the GPU devices. How feature-complete it is on any one device is entirely dependent on how well that device is supported by the library.

From NVIDIA's NVML documentation:

  • Full Support

    • NVIDIA Tesla Line: S2050, C2050, C2070, C2075, M2050, M2070, M2075, M2090, X2070, X2090, K10, K20, K20X
    • NVIDIA Quadro Line: 4000, 5000, 6000, 7000, M2070-Q, 600, 2000, 3000M and 410
    • NVIDIA GeForce Line: None
  • Limited Support

    • NVIDIA Tesla Line: S1070, C1060, M1060
    • NVIDIA Quadro Line: All other current and previous generation Quadro-branded parts
    • NVIDIA GeForce Line: All current and previous generation GeForce-branded parts

Dependencies

cudamon is meant to be as simple and quick to build and use as possible, and therefore is very light on run- and build-time dependencies. The only library actually required to run (outside of normal libc, pthreads, etc) is the CUDA device driver, which you should have anyway if you plan on using the device in any sense.

Other build time dependencies are included, and are the mongoose embedded HTTP server and NVIDIA's NVML library. See the appropriate README in lib/*/ for more information on these libraries.

The web-based client utilizes Shutterstock's Rickshaw graphing library, which itself is based on d3.js. jQuery and jQuery-ui are also pulled in by these libraries.

Development

Server

The server component of cudamon is very simple, a few hundred lines of C based off of a couple of solid libraries doing all the heavy lifting. I didn't implement all of NVML's functionality, so if you find some metric that you think should be tracked, feel free to add it or just let me know that it should be implemented. This is currently a one-person project done for my own benefit while working on CUDA programs, so I have no idea what other people would want or need.

Client

If you think the client is horribly ugly (I don't blame you) and want to focus on fixing that, try running ./faker.rb, which just serves up some random fake data from port 4567. This could be useful if you want to test locally and don't have a CUDA device installed, or have a device that doesn't implement many of the features. You should install Sinatra (gem install sinatra) if you want to use faker.rb.

My design abilities are pretty poor, so I am especially open to any improvements on the client side.

License

See lib/nvml/COPYING.txt for NVML's (non-F/LOSS) license.

Copyright (c) 2013 Erik Price

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.