Pinned Repositories
faiss
A library for efficient similarity search and clustering of dense vectors.
folly
An open-source C++ library developed and used at Facebook.
HierarchicalKV
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
kaldi
This is the official location of the Kaldi project.
mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
runtime
A performant and modular runtime for TensorFlow
tensorflow
Computation using data flow graphs for scalable machine learning
thrift
Mirror of Apache Thrift
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
kevin-14's Repositories
kevin-14/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
kevin-14/Awesome-LLM-Inference
💻A small Collection for Awesome LLM Inference [Papers|Blogs|Docs] with codes, contains TensorRT-LLM, streaming-llm, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
kevin-14/ChatGPT
🔮 ChatGPT Desktop Application (Mac, Windows and Linux)
kevin-14/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
kevin-14/CUDA-Learn-Note
🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
kevin-14/DeepLearningSystem
Deep Learning System core principles introduction.
kevin-14/DeepSpeedExamples
Example models using DeepSpeed
kevin-14/FasterTransformer
Transformer related optimization, including BERT, GPT
kevin-14/flash-attention
Fast and memory-efficient exact attention
kevin-14/glibc
Unofficial mirror of sourceware glibc repository. Updated daily.
kevin-14/gloo
Collective communications library with various primitives for multi-machine training.
kevin-14/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
kevin-14/llama.cpp
Port of Facebook's LLaMA model in C/C++
kevin-14/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
kevin-14/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
kevin-14/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
kevin-14/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
kevin-14/pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
kevin-14/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
kevin-14/pwndbg
Exploit Development and Reverse Engineering with GDB Made Easy
kevin-14/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
kevin-14/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
kevin-14/the-algorithm
Source code for Twitter's Recommendation Algorithm
kevin-14/tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
kevin-14/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
kevin-14/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
kevin-14/triton
Development repository for the Triton language and compiler
kevin-14/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
kevin-14/web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
kevin-14/whisper.cpp
Port of OpenAI's Whisper model in C/C++