Kevin-shihello-world/vllm-compress-comm
vllm-compress-comm use inverse FFT and a new kind of training strategy on training of a new kind of diffution model to compress those Tensors transport among GPUs in accelerating multi-GPU inferencing.
PythonApache-2.0
No issues in this repository yet.