[BUG] crash in algos.ipynb when I try and run it on my cuda device...
lovettchris opened this issue · 2 comments
lovettchris commented
Describe the bug
I don't know if this is a windows thing or not but when I run the PartialTrainingValAccuracy on my cuda device the parallel_partial_tr block crashes with:
error 18:24:45.920: Raw kernel process exited code: 3
error 18:24:45.922: Error in waiting for cell to complete Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:33213)
at c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:52265
at Map.forEach (<anonymous>)
at y._clearKernelState (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:52250)
at y.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:45732)
at c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:17:139244
at Z (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:1608939)
at Kp.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:17:139221)
at qp.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:17:146518)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
warn 18:24:45.923: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
I wonder if this description included in your markdown is missing the device="cuda" parameter on the PartialTrainingValAccuracy constructor?
RayParallelObjective(
PartialTrainingValAccuracy(training_epochs=1),
num_gpus=0.5, # 2 jobs per gpu available
max_calls=1
)
Because this is what you have in the code a bit later on:
RayParallelEvaluator(
PartialTrainingValAccuracy(training_epochs=1, device='cuda'),
num_gpus=0.5, # 2 jobs per gpu available
max_calls=1
),
So you might want to mention here that this will require your machine have GPU and CUDA python setup... I did and so this worked on my machine, but a heads up might be necessary for other readers... is there a "first notebook" entry point to all these notebooks?