Pinned Repositories
baby-llm
RedisXANN
redis x ANN vss
RedisXLM
redis x language model inference (load trained model); size (tiny|t, small|s, medium|m, large|l) with quantization; NOTE: redis embedded language model, available for stand-alone version only
RedisXSlot
redis x slot module, async block migrate/restore, don't block or less block other cmd.
craftsman
a craftsman, try best to give a best practicable solution for biz~
doraemon-nb
ipython notebooks do some sample experiments , make some idea
geo
php 扩展(地图相关的操作)
iowrapper
io_uring lib or syscall wrapper; add some benchmark for io_uring experimental study
perf-book-cn
https://github.com/dendibakh/perf-book gitbook在线电子书,翻译成中文原始markdown文档
weedge's Repositories
weedge/ceph
Ceph is a distributed object, block, and file storage platform
weedge/CppCon2023
Slides and other materials from CppCon 2023
weedge/cutlass
CUDA Templates for Linear Algebra Subroutines
weedge/DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
weedge/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
weedge/DeepSpeed-Kernels
view: how to optimize
weedge/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. see this: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-fastgen/chinese/README.md to learn splitfuse; TensorRT-LLM follow impl splitfuse https://github.com/NVIDIA/TensorRT-LLM/issues/317
weedge/DeepSpeedExamples
Example models using DeepSpeed
weedge/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
weedge/faster-whisper
Faster Whisper transcription with CTranslate2
weedge/flash-attention
Fast and memory-efficient exact attention
weedge/fluid
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
weedge/jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
weedge/landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
weedge/learn
学习相关技术论文&书籍
weedge/LLaMA-Factory
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
weedge/llama2.c
Inference Llama 2 in one file of pure C; run pi
weedge/Megatron-LM
Ongoing research training transformer models at scale
weedge/ModuleFormer
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.
weedge/OpenRLHF
A Ray-based High-performance RLHF framework (for large models)
weedge/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
weedge/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
weedge/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
weedge/serve
Serve, optimize and scale PyTorch models in production; view cpp_backend_revive branch; java frontend + cpp backend serve like Doris
weedge/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
weedge/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
weedge/triton
Development repository for the Triton language and compiler, 算子加速
weedge/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech; see train for meloTTS train
weedge/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
weedge/whisper.cpp
Port of OpenAI's Whisper model in C/C++