yassouali's Stars
comfyanonymous/ComfyUI
The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
karpathy/llm.c
LLM training in simple, raw C/CUDA
HigherOrderCO/Bend
A massively parallel, high-level programming language
roboflow/supervision
We write your reusable computer vision tools. 💜
princeton-nlp/SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
OpenAccess-AI-Collective/axolotl
Go ahead and axolotl questions
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
openai/transformer-debugger
pytorch/torchtune
A Native-PyTorch Library for LLM Fine-tuning
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
albertan017/LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
sgl-project/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
cohere-ai/cohere-toolkit
Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
sarah-ek/faer-rs
Linear algebra foundation for the Rust programming language
Modos-Labs/Glider
Open-source E-ink monitor. Mirror of https://gitlab.com/zephray/glider
Hirrolot/datatype99
Algebraic data types for C99
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
srush/Triton-Puzzles
Puzzles for learning Triton
MDK8888/GPTFast
Accelerate your Hugging Face Transformers 6-8.5x. Native to Hugging Face and PyTorch.
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
pytorch/ao
Native PyTorch library for quantization and sparsity
HazyResearch/aisys-building-blocks
Building blocks for foundation models.
razetime/ngn-k-tutorial
An ngn/k tutorial.