zhaocc1106's Stars
vesoft-inc/nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
xtensor-stack/xtensor
C++ tensors with broadcasting and lazy computing
ztxz16/fastllm
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
GitHubDaily/GitHubDaily
坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.
ruanyf/weekly
科技爱好者周刊,每周五发布
luhengshiwo/LLMForEverybody
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
facebookresearch/generative-recommenders
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
NetEase-Media/ControlTalk
Official code for "Controllable Talking Face Generation by Implicit Facial Keypoints Editing"
zurutech/pillow-resize
Porting of Pillow resize method in C++ and OpenCV.
NVIDIA/CUDALibrarySamples
CUDA Library Samples
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
NetEase-Media/grps_trtllm
【高性能OpenAI LLM服务】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
kyutai-labs/moshi
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
NetEase-Media/grps_vllm
【grps接入vllm】通过vllm LLMEngine Api实现LLM服务。
alibaba/PhotonLibOS
Probably the fastest coroutine lib in the world!
zhaohb/fastapi_tritonserver
jinja2cpp/Jinja2Cpp
Jinja2 C++ (and for C++) almost full-conformance template engine implementation
buptzyb/tensorflow
An Open Source Machine Learning Framework for Everyone
NVIDIA/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
gpu-mode/lectures
Material for gpu-mode lectures
xuchengsheng/spring-reading
涵盖了 Spring 框架的核心概念和关键功能,包括控制反转(IOC)容器的使用,面向切面编程(AOP)的原理与实践,事务管理的方式与实现,Spring MVC 的流程与控制器工作机制,以及 Spring 中数据访问、安全、Boot 自动配置等方面的深入研究。此外,它还包含了 Spring 事件机制的应用、高级主题如缓存抽象和响应式编程,以及对 Spring 源码的编程风格与设计模式的深入探讨。
NetEase-Media/grps
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
netease-youdao/QAnything
Question and Answer based on Anything.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
mysql/mysql-connector-cpp
MySQL Connector/C++ is a MySQL database connector for C++. It lets you develop C++ and C applications that connect to MySQL Server.
bombela/backward-cpp
A beautiful stack trace pretty printer for C++