CUDA + multiprocessing issue
Opened this issue · 2 comments
Describe the bug
A Exception cudaErrorInitializationError: initialization error
occurs within the multiprocessing pool when using GPU/CUDA on two or more files. This happens in the feature_finding step but could potentially affect any time CuPY is used within the entire workflow.
To Reproduce
Environment: nvcc** --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
Using cupy-cuda115==10.2.0
.
Script: following the convention described by test_gpu_.py
,
def main():
global alphapept
alphapept.performance.set_compilation_mode('cuda')
alphapept.performance.set_worker_count(30)
importlib.reload(alphapept.feature_finding)
settings = load_settings('/home/ubuntu/apps/alphapept/test_settings.yaml')
r = alphapept.interface.import_raw_data(settings)
r = alphapept.interface.feature_finding(settings)
where test_settings.yaml
is all the defaults, with two or more files in experiment/file_paths
Error
For three separate files
022-03-08 18:57:55> No *.hdf file with features found for /mnt/EXP21155/EXP21155_2021ms0603X7_A.ms_data.hdf. Adding to feature finding list.
2022-03-08 18:57:55> Feature finding on /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw
2022-03-08 18:57:55> Hill extraction with centroid_tol 8 and max_gap 2
2022-03-08 18:57:55> Feature finding of file /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:55> Processing of /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw for step find_features failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:55> No *.hdf file with features found for /mnt/EXP21155/EXP21155_2021ms0609X26_A.ms_data.hdf. Adding to feature finding list.
2022-03-08 18:57:56> Feature finding on /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw
2022-03-08 18:57:56> Hill extraction with centroid_tol 8 and max_gap 2
2022-03-08 18:57:56> Feature finding of file /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:56> Processing of /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw for step find_features failed. Exception cudaErrorInitializationError: initialization error
A Solution?
After some research, I was able to find the source of the problem. The combination of multiprocessing pools and CUDA is a little tricky. In short, we cannot use the CuPY API before we spawn processes. I'm not exactly sure where this happens in the code given, but I expect it's in some of the settings management. The solution I found was to set multiprocessing.set_start_method('spawn')
('forkserver' also works).
The speed and stability of the three options is up for debate, and I'm not sure if we will be able to obtain performance advantages using GPU if we cannot fork processes. I'm not an expert on multiprocessing, though.
Would like to know if you can replicate this problem and suggest a fix. Thank you.
hi hugokitano!
my system is Ubuntu 20 4.I have the same problem. Have you solved it?
Hi,
I had never tested analyzing multiple files on GPU, so this could indeed be an issue, and this potentially will not work out of the box. Historically, the GPU part started with how to improve performance on a single file. The use case here could be to launch multiple docker instances on single files and then combine them later in another instance.
However, if anyone has good ideas to get the multiprocessing to work or wants to tackle this, I am all ears.