Pinned Repositories
flash-attention
Fast and memory-efficient exact attention
safetensors
Simple, safe way to store and distribute tensors
lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
tabbyAPI
An OAI compatible exllamav2 API that's both lightweight and fast
alpaca_lora_4bit
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
exui
Web UI for ExLlamaV2
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.
turboderp's Repositories
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
turboderp/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
turboderp/exui
Web UI for ExLlamaV2
turboderp/alpaca_lora_4bit
turboderp/text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.