/LLM-Distributed-Quantization

Accelerating multi-node Large Language Model training with per-layer selective quantization (FP32 -> FP16) of the transformer architecture.

Primary LanguagePythonApache License 2.0Apache-2.0

Stargazers