Pinned Repositories
ai-clone-whatsapp
Create an AI clone of yourself from your WhatsApp chats (using Mistral 7B)
aider
aider is AI pair programming in your terminal
Anima
33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
AutoRAG
RAG AutoML Tool - Find optimal RAG pipeline for your own data.
BCEmbedding
Netease Youdao's open-source embedding and reranker models for RAG products.
BiLLM
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
DistiLlama
Chrome Extension to Summarize Web Pages Using locally running LLMs
lit-llama
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.
R2R
The framework for fast development and deployment of RAG backends.
bettercallcaleb's Repositories
bettercallcaleb/QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.
bettercallcaleb/R2R
The framework for fast development and deployment of RAG backends.
bettercallcaleb/ai-clone-whatsapp
Create an AI clone of yourself from your WhatsApp chats (using Mistral 7B)
bettercallcaleb/aider
aider is AI pair programming in your terminal
bettercallcaleb/AutoRAG
RAG AutoML Tool - Find optimal RAG pipeline for your own data.
bettercallcaleb/BCEmbedding
Netease Youdao's open-source embedding and reranker models for RAG products.
bettercallcaleb/BiLLM
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
bettercallcaleb/ComfyUI-J
Jannchie's ComfyUI custom nodes.
bettercallcaleb/CustomGPT-Google-Sheets-RAG
Allows you to use Google Sheets to store and retrieve data from your custom GPT
bettercallcaleb/DreamGenTrain
bettercallcaleb/fltr
Like grep but for natural language questions. Based on Mixtral 8x7B.
bettercallcaleb/free-reddit-comments-nuke
a free python tool that nukes all your reddit comments
bettercallcaleb/GPTFast
Accelerate your Hugging Face Transformers 6-7x. Native to Hugging Face and PyTorch.
bettercallcaleb/InfLLM
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
bettercallcaleb/jen-ai
A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.
bettercallcaleb/LLaMA-Factory
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
bettercallcaleb/llama_index
LlamaIndex (formerly GPT Index) is a data framework for your LLM applications
bettercallcaleb/llm-scraper
Turn any webpage into structured data using LLMs
bettercallcaleb/LongLM
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
bettercallcaleb/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
bettercallcaleb/LWM
bettercallcaleb/marker
Convert PDF to markdown quickly with high accuracy
bettercallcaleb/MicroLlama
Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget
bettercallcaleb/mistral.rs
Blazingly fast LLM inference.
bettercallcaleb/Open-Ollama-RAG-ChatApp
Retrieval-Augmented Generation Chat Bot using Ollama, Langchain and Gradio.
bettercallcaleb/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
bettercallcaleb/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.
bettercallcaleb/summarize
Video summarization from multiple sources (YouTube, Dropbox, Google Drive, local files) using multiple LLM endpoints (OpenAI, Groq, LM-studio).
bettercallcaleb/SWE-agent
SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models
bettercallcaleb/talk-llama-fast
Port of OpenAI's Whisper model in C/C++, fast and with xtts