Pinned Repositories
footprint_research-Stature_estimation_using_CNN
Object_detection_drinks
Recognition-of-Yoga-Poses-through-an-Interactive-System-with-Kinect-device
MInference
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
KsanaLLM
Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.