Why use offload_param in CPU?
xesdiny opened this issue · 1 comments
xesdiny commented
I think the model fragment loading can be completed under the 6.7B parameter, why use parameterized offload to the cpu?
"offload_param": {
"device": "cpu",
"pin_memory": true
},
AetherCortex commented
We want to train with a larger batchsize