kevin-14

Pinned Repositories

faiss
A library for efficient similarity search and clustering of dense vectors.
Language:C++00
folly
An open-source C++ library developed and used at Facebook.
Language:C++00
HierarchicalKV
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
Language:Cuda10
intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Language:Python10
kaldi
This is the official location of the Kaldi project.
Language:Shell10
mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Language:C++00
runtime
A performant and modular runtime for TensorFlow
Language:MLIR10
tensorflow
Computation using data flow graphs for scalable machine learning
Language:C++00
thrift
Mirror of Apache Thrift
Language:C++00
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
Language:C++00

kevin-14's Repositories

kevin-14/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
kevin-14/Awesome-LLM-Inference
💻A small Collection for Awesome LLM Inference [Papers|Blogs|Docs] with codes, contains TensorRT-LLM, streaming-llm, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
kevin-14/ChatGPT
🔮 ChatGPT Desktop Application (Mac, Windows and Linux)
kevin-14/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
kevin-14/CUDA-Learn-Note
🎉CUDA 笔记 / 高频面试题汇总 / C++笔记，个人笔记，更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
kevin-14/DeepLearningSystem
Deep Learning System core principles introduction.
kevin-14/DeepSpeedExamples
Example models using DeepSpeed
kevin-14/FasterTransformer
Transformer related optimization, including BERT, GPT
kevin-14/flash-attention
Fast and memory-efficient exact attention
kevin-14/glibc
Unofficial mirror of sourceware glibc repository. Updated daily.
kevin-14/gloo
Collective communications library with various primitives for multi-machine training.
kevin-14/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
kevin-14/llama.cpp
Port of Facebook's LLaMA model in C/C++
kevin-14/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
kevin-14/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
kevin-14/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
kevin-14/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
kevin-14/pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
kevin-14/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
kevin-14/pwndbg
Exploit Development and Reverse Engineering with GDB Made Easy
kevin-14/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
kevin-14/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
kevin-14/the-algorithm
Source code for Twitter's Recommendation Algorithm
kevin-14/tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
kevin-14/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
kevin-14/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
kevin-14/triton
Development repository for the Triton language and compiler
kevin-14/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
kevin-14/web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
kevin-14/whisper.cpp
Port of OpenAI's Whisper model in C/C++