/vllm-compress-comm

vllm-compress-comm use inverse FFT and a new kind of training strategy on training of a new kind of diffution model to compress those Tensors transport among GPUs in accelerating multi-GPU inferencing.

Primary LanguagePythonApache License 2.0Apache-2.0

No issues in this repository yet.