KastanDay/LLM-Distributed-Quantization
Accelerating multi-node Large Language Model training with per-layer selective quantization (FP32 -> FP16) of the transformer architecture.
PythonApache-2.0
Accelerating multi-node Large Language Model training with per-layer selective quantization (FP32 -> FP16) of the transformer architecture.
PythonApache-2.0