Huggingface Transformersλ₯Ό μ¬μ©ν΄ LMμ νμ΅ν λ, λ¨μ LM(CLM) νμ΅ μμ μ λ³΄λ€ μ½κ² μμνκΈ° μν Boilerplate νλ‘μ νΈ.
- CPython 3.10+
- PyTorchλ CUDA νκ²½μ λ§κ² μ€μΉνκΈ° (1.12.1 μ΄μ)
pip install -r requirements.txt
pip install -U deepspeed # Deepspeed branch νμ
./train.sh
- μμ νκ²½μ RAM 1TB, GPU A100 40GB x4μ₯ νκ²½μμ μ€ν
- CUDA 11.6/11.7
- PyTorch 2.0
- KoAlpaca Datasetμ νμ΅ λ°μ΄ν°λ‘ μ¬μ©ν¨
- DeepSpeed ZeRO3, Optimizerμ Parameter λͺ¨λ CPU Offload
- Seq len 1024
- Max batch size 1 (per GPU)
- GPUλΉ μ½ 27GB vram μ¬μ© (= V100 32Gμμλ μ¬μ© κ°λ₯ν κ²μΌλ‘ μμ)