jalola's Stars
codecrafters-io/build-your-own-x
Master programming by recreating your favorite technologies from scratch.
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
meta-llama/llama
Inference code for Llama models
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
karpathy/llm.c
LLM training in simple, raw C/CUDA
karpathy/llama2.c
Inference Llama 2 in one file of pure C
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
meta-llama/codellama
Inference code for CodeLlama models
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
visenger/awesome-mlops
A curated list of references for MLOps
huggingface/text-generation-inference
Large Language Model Text Generation Inference
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
allegroai/clearml
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
daquexian/onnx-simplifier
Simplify your onnx model
karpathy/ng-video-lecture
facebookresearch/esm
Evolutionary Scale Modeling (esm): Pretrained language models for proteins
meta-llama/PurpleLlama
Set of tools to assess and improve LLM security.
GoogleCloudPlatform/ml-design-patterns
Source code accompanying O'Reilly book: Machine Learning Design Patterns
ELS-RD/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
onnx/onnxmltools
ONNXMLTools enables conversion of models to ONNX
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
triton-inference-server/model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Snowflake-Labs/sfquickstarts
Follow along with our tutorials to get you up and running with Snowflake.
microsoft/onnxruntime-extensions
onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
ramonhagenaars/jsons
🐍 A Python lib for (de)serializing Python objects to/from JSON
microsoft/DigiFace1M
premAI-io/benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
kamyu104/GoogleKickStart-2022
🏃 Python3 Solutions of All 32 Problems in GKS 2022