Pinned Repositories
bitsandbytes
8-bit CUDA functions for PyTorch
bitsandbytes-windows-webui
Windows compile of bitsandbytes for use in text-generation-webui.
ctransformers-cuBLAS-wheels
ctransformers wheels with pre-built CUDA binaries for additional CUDA and AVX versions.
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
flash-attention
Fast and memory-efficient exact attention - Windows wheels
GPTQ-for-LLaMa-CUDA
A combination of Oobabooga's fork and the main cuda branch of GPTQ-for-LLaMa in a package format.
GPTQ-for-LLaMa-Wheels
Precompiled Wheels for GPTQ-for-LLaMa
llama-cpp-python-cuBLAS-wheels
Wheels for llama-cpp-python compiled with cuBLAS support
one-click-installers
Simplified installers for oobabooga/text-generation-webui.
windows-venv-installers
Standalone, dependency-less scripts for automatically setting up a virtual environment for easy project installation on Windows.
jllllll's Repositories
jllllll/bitsandbytes-windows-webui
Windows compile of bitsandbytes for use in text-generation-webui.
jllllll/llama-cpp-python-cuBLAS-wheels
Wheels for llama-cpp-python compiled with cuBLAS support
jllllll/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
jllllll/one-click-installers
Simplified installers for oobabooga/text-generation-webui.
jllllll/bitsandbytes
8-bit CUDA functions for PyTorch
jllllll/flash-attention
Fast and memory-efficient exact attention - Windows wheels
jllllll/GPTQ-for-LLaMa-CUDA
A combination of Oobabooga's fork and the main cuda branch of GPTQ-for-LLaMa in a package format.
jllllll/GPTQ-for-LLaMa-Wheels
Precompiled Wheels for GPTQ-for-LLaMa
jllllll/ctransformers-cuBLAS-wheels
ctransformers wheels with pre-built CUDA binaries for additional CUDA and AVX versions.
jllllll/windows-venv-installers
Standalone, dependency-less scripts for automatically setting up a virtual environment for easy project installation on Windows.
jllllll/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
jllllll/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
jllllll/text-generation-webui
A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
jllllll/h2ogpt
Private Q&A and summarization of documents+images or chat with local GPT, 100% private, no data leaks, Apache 2.0. Demo: https://gpt.h2o.ai/
jllllll/GPTQ-for-LLaMa
4 bits quantization of LLMs using GPTQ
jllllll/ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
jllllll/safetensors
Simple, safe way to store and distribute tensors
jllllll/scikit-build-core
A next generation Python CMake adaptor and Python API for plugins
jllllll/SillyTavern
LLM Frontend for Power Users.