FourierFlows/GeophysicalFlows.jl

Add remark on multithreading/multiple GPU limitations that FourierFlows.jl imposes

navidcy opened this issue · 4 comments

We should add a remark in the README and in the Docs on this.

[This was mentioned by @ranocha's in their review remarks.]

Regarding multiple GPUs, probably calling a Problem constructor with dev=GPU() forced CUDA.jl to use device=0...(?)

E.g., when I asked for 3 GPUs on the HPC I got:

On a machine with 3

julia> prob = SingleLayerQG.Problem(GPU(); nx=n, ny=n+2, Lx=L, β=β, μ=μ, dt=dt, stepper=stepper)
Problem
  ├─────────── grid: grid (on GPU)
  ├───── parameters: params
  ├────── variables: vars
  ├─── state vector: sol
  ├─────── equation: eqn
  ├────────── clock: clock
  └──── timestepper: FilteredRK4TimeStepper

shell> nvidia-smi
Tue Mar  9 15:20:01 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:3D:00.0 Off |                    0 |
| N/A   35C    P0    57W / 300W |    410MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:3E:00.0 Off |                    0 |
| N/A   33C    P0    41W / 300W |      3MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:B2:00.0 Off |                    0 |
| N/A   35C    P0    42W / 300W |      3MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12191      C   ...ta/v45/nc3020/julia/julia      407MiB |
+-----------------------------------------------------------------------------+

julia>

I think CUDA.jl may pick this GPU by default. I think the best solution is to link to CUDA.jl documentation for choosing a device. Users are also able to do fancier things, like run two problems side by side on different GPUs (some explanation is provided in the CUDA.jl docs for this).

Could you point to this explanation and I’ll add a not in our docs.

Here's some references:

Often the most straightforward approach to using mulitple GPUs is to launch the same script but with different CUDA_VISIBLE_DEVICES parameters. This approach is outside julia.

$ CUDA_VISIBLE_DEVICES=0 julia --project cool_script.jl

This launches julia but with only one device visible (device "0" from the output of nvidia-smi). This environment variable is described in nvidia's CUDA documentation:

https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/