Training process freezes without using GPUs
TOP-RX opened this issue · 1 comments
Description
I just simply try to run the code for GIANT-XRT training process for ogbn-arxiv, but it seems the code freezes without allocating any GPUs for training.
How to Reproduce?
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)
data_dir=./proc_data_xrt/ogbn-arxiv
bash xrt_train.sh ${data_dir}
(Paste the commands you ran that produced the error.)
1.data_dir=./proc_data_xrt/ogbn-arxiv
bash xrt_train.sh ${data_dir}
2.
What have you tried to solve it?
Error message or code output
The code stuck here. And no GPUs are used.
warnings.warn(
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - ***** Running training *****
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - Num examples = 169286
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - Num labels = 32
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - Num Epochs = 4
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - Learning Rate Schedule = linear
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - Batch size = 256
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - Gradient Accumulation steps = 1
09/24/2023 01:46:52 - INFO - pecos.xmc.xtransformer.matcher - Total optimization steps = 2500
Environment
- Operating system:
- Python version:
- PECOS version:
(Add as much information about your environment as possible, e.g. dependencies versions.)
haved you solved?how to solve, I am the same with you