如何多卡训练
Jianghaiyang0729 opened this issue · 2 comments
作者你好!我现在想用几个baseline模型在大规模数据集上测试,比如GBA和GLA。但是我在H100上测试的时候,有些baseline模型会out of memory,比如一些Transformer模型(Pyraformer)。那么请问,怎么设置多卡训练呢?(我使用的服务器上,每个节点有三个H100)是不是要在baselines/Pyraformer/GBA.py文件中加入一些设置呢?(这个文件是我后加的,在您原来的文件中没有这个GBA.py)
假设您有3块GPU,他们的编号分别是0,1,2,那么可以通过设置CFG.GPU_NUM为3,并在运行https://github.com/zezhishao/BasicTS/blob/master/experiments/train.py
脚本时,指定--gpus为"0,1,2"即可。
Dear Jianghaiyang0729, thank you for your attention,if BasicTS helped you, please cite this paper in your fancy works, best wishes:
[1] Shao Z, Wang F, Xu Y, et al. Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis[J]. arXiv preprint arXiv:2310.06119, 2023.
@misc{shao2023exploringprogressmultivariatetime,
title={Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis},
author={Zezhi Shao and Fei Wang and Yongjun Xu and Wei Wei and Chengqing Yu and Zhao Zhang and Di Yao and Guangyin Jin and Xin Cao and Gao Cong and Christian S. Jensen and Xueqi Cheng},
year={2023},
eprint={2310.06119},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2310.06119},
}