llmware
is a unified framework for developing LLM-based application patterns including Retrieval Augmented Generation (RAG). This project provides an integrated set of tools that anyone can use β from beginner to the most sophisticated AI developer β to rapidly build industrial-grade, knowledge-based enterprise LLM applications with specific focus on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely to LLMs in private cloud.
llmware
is an integrated framework comprised of four major components:
Retrieval: Assemble and Query knowledge base
- High-performance document parsers to rapidly ingest, text chunk and ingest common document types.
- Comprehensive intuitive querying methods: semantic, text, and hybrid retrieval with integrated metadata.
- Ranking and filtering strategies to enable semantic search and rapid retrieval of information.
- Web scrapers, Wikipedia integration, and Yahoo Finance API integration.
Prompt: Simple, Unified Abstraction across 50+ Models
- Connect Models: Simple high-level interface with support for 50+ models out of the box.
- Prompts with Sources: Powerful abstraction to easily package a wide range of materials into prompts.
- Post Processing: tools for evidence verification, classification of a response, and fact-checking.
- Human in the Loop: Ability to enable user ratings, feedback, and corrections of AI responses.
- Auditability: A flexible state mechanism to analyze and audit the LLM prompt lifecycle.
Vector Embeddings: swappable embedding models and vector databases
- Industry Bert: out-of-the-box industry finetuned open source Sentence Transformers.
- Wide Model Support: Custom trained HuggingFace, sentence transformer embedding models and leading commercial models.
- Mix-and-match among multiple options to find the right solution for any particular application.
- Out-of-the-box support for 7 vector databases - Milvus, Postgres (PG Vector), Redis, FAISS, Qdrant, Pinecone and Mongo Atlas.
Parsing and Text Chunking: Scalable Ingestion
- Integrated High-Speed Parsers for: PDF, PowerPoint, Word, Excel, HTML, Text, WAV, AWS Transcribe transcripts.
- Text-chunking tools to separate information and associated metadata to a consistent block format.
π Explore additional llmware capabilities and π¬ Check out these videos on how to quickly get started with RAG:
- Use small LLMs for RAG for Contract Analysis (feat. LLMWare)
- Invoice Processing with LLMware
- Ingest PDFs at Scale
- Evaluate LLMs for RAG with LLMWare
- Fast Start to RAG with LLMWare Open Source Library
- Use Retrieval Augmented Generation (RAG) without a Database
- RAG using CPU-based (No-GPU required) Hugging Face Models with LLMWare on your laptop
- Pop up LLMWare Inference Server
- DRAGON-7B-Models
pip install llmware
or
python3 -m pip install llmware
See Working with llmware for other options to get up and running.
MongoDB and Milvus are optional and used to provide production-grade database and vector embedding capabilities. The fastest way to get started is to use the provided Docker Compose file (note: requires Docker Compose / Docker desktop to be installed) which takes care of running them both:
curl -o docker-compose.yaml https://raw.githubusercontent.com/llmware-ai/llmware/main/docker-compose.yaml
and then run the containers:
docker compose up -d
Not ready to install MongoDB or Milvus? Check out what you can do without them in our examples section.
See Running MongoDB and Milvus for other options to get up and running with these optional dependencies.
# This example illustrates a simple contract analysis
# using a small RAG-optimized LLM running locally
import os
import re
from llmware.prompts import Prompt, HumanInTheLoop
from llmware.setup import Setup
from llmware.configs import LLMWareConfig
def contract_analysis_on_laptop (model_name):
# Load the llmware sample files
print (f"\n > Loading the llmware sample files...")
sample_files_path = Setup().load_sample_files()
contracts_path = os.path.join(sample_files_path,"Agreements")
# query list
query_list = {"executive employment agreement": "What are the name of the two parties?",
"base salary": "What is the executive's base salary?",
"governing law": "What is the governing law?"}
print (f"\n > Loading model {model_name}...")
prompter = Prompt().load_model(model_name)
for i, contract in enumerate(os.listdir(contracts_path)):
# excluding Mac file artifact
if contract != ".DS_Store":
print("\nAnalyzing contract: ", str(i+1), contract)
print("LLM Responses:")
for key, value in query_list.items():
# contract is parsed, text-chunked, and then filtered by topic key
source = prompter.add_source_document(contracts_path, contract, query=key)
# calling the LLM with 'source' information from the contract automatically packaged into the prompt
responses = prompter.prompt_with_source(value, prompt_name="just_the_facts", temperature=0.3)
for r, response in enumerate(responses):
print(key, ":", re.sub("[\n]"," ", response["llm_response"]).strip())
# We're done with this contract, clear the source from the prompt
prompter.clear_source_materials()
# Save jsonl report to jsonl to /prompt_history folder
print("\nPrompt state saved at: ", os.path.join(LLMWareConfig.get_prompt_path(),prompter.prompt_id))
prompter.save_state()
# Save csv report that includes the model, response, prompt, and evidence for human-in-the-loop review
csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()
print("csv output saved at: ", csv_output)
if __name__ == "__main__":
# use local cpu model - smallest, fastest (use larger BLING models for higher accuracy)
model = "llmware/bling-1b-0.1"
contract_analysis_on_laptop(model)
π See 50+ llmware examples for more RAG examples and other code samples and ideas.
To use LLMWare, you do not need to use any proprietary LLM - we would encourage you to experiment with BLING, DRAGON, Industry-BERT, the GGUF examples, along with bringing in your favorite models from HuggingFace and Sentence Transformers.
If you would like to use a proprietary model, you will need to provide your own API Keys. API keys and secrets for models, aws, and pinecone can be set-up for use in environment variables or passed directly to method calls.
There are several options for getting MongoDB running
π³ A. Run mongo container with docker
docker run -d -p 27017:27017 -v mongodb-volume:/data/db --name=mongodb mongo:latest
π³ B. Run container with docker compose
Create a docker-compose.yaml file with the content:
version: "3"
services:
mongodb:
container_name: mongodb
image: 'mongo:latest'
volumes:
- mongodb-volume:/data/db
ports:
- '27017:27017'
volumes:
llmware-mongodb:
driver: local
and then run:
docker compose up
π C. Install MongoDB natively
π D. Connect to an existing MongoDB deployment
You can connect to an existing MongoDB deployment by setting the connection string to the environment variable, COLLECTION_DB_URI
. See the example script, Using Mongo Atlas, for detailed information on how to use Mongo Atlas as the NoSQL and/or Vector Database for llmware
.
Additional information on finding and formatting connection strings can be found in the MongoDB Connection Strings Documentation.
The llmware repo can be pulled locally to get access to all the examples, or to work directly with the latest version of the llmware code.
git clone git@github.com:llmware-ai/llmware.git
or download/extract a zip of the llmware repository
Update the local copy of the repository:
git pull
Download the shared llmware native libraries and dependencies by running the load_native_libraries.sh script. This pulls the right wheel for your platform and extracts the llmware native libraries and dependencies into the proper place in the local repository.
./scripts/dev/load_native_libraries.sh
At the top level of the llmware repository run the following command:
pip install .
Questions and discussions are welcome in our github discussions.
Interested in contributing to llmware? We welcome collaboration. Our roadmap is focused primarily on the following areas:
- π‘ Making it easy to deploy fine-tuned open source models to build state-of-the-art RAG workflows
- π‘ Private cloud - keeping documents, data pipelines, data stores, and models safe and secure
- π‘ Model quantization, especially GGUF, and democratizing the game-changing use of 7B CPU-based LLMs
- π‘ Developing small specialized RAG optimized LLMs between 1B-7B parameters
- π‘ Industry-specific LLMs, embedding models and processes to support core knowledge-based use cases
- π‘ Enterprise scalability - containerization, worker deployments and Kubernetes
- π‘ Integration of SQL and other scale enterprise data sources
Like our models, we aspire for llmware to be "small, but mighty" - easy to use and get started, but packing a powerful punch!
Information on ways to participate can be found in our Contributors Guide. As with all aspects of this project, contributing is governed by our Code of Conduct.
Latest Updates - 22 Dec 2023: llmware v0.1.13
- Added 3 new vector databases - Postgres (PG Vector), Redis, and Qdrant
- Improved support for integrating sentence transformers directly in the model catalog
- Improvements in the model catalog attributes, including discovery and customization
- Multiple new Examples in Models & Embeddings, including GGUF, Vector database, and model catalog
Latest Updates - 17 Dec 2023: llmware v0.1.12
- dragon-deci-7b added to catalog - RAG-finetuned model on high-performance new 7B model base from Deci
- New GGUFGenerativeModel class for easy integration of GGUF Models
- Adding prebuilt llama_cpp / ctransformer shared libraries for 'out of the box' use on Mac M1, Mac x86, Linux x86 and Windows
- 3 DRAGON models packaged as Q4_K_M GGUF models for CPU laptop use (dragon-mistral-7b, dragon-llama-7b, dragon-yi-6b)
- 4 leading open source chat models added to default catalog with Q4_K_M with support for specific chat prompt wrappers
Supported Operating Systems:
- MacOS
- Linux
- Windows
Supported Vector Databases:
- Milvus
- Postgres (PG Vector)
- Redis
- FAISS
- Pinecone
- MongoDB Atlas Vector Search
- Qdrant
Prereqs:
- All Platforms: Python v3.9 - 3.11
- To enable the OCR parsing capabilities, install Tesseract v5.3.3 and Poppler v23.10.0 native packages.
Optional:
Known issues:
- A segmentation fault can occur when parsing if the native package for mongo-c-driver is 1.25 or above. To address this issue, install the latest version of llmware or downgrade mongo-c-driver to v1.24.4.
π§ Change Log
Latest Updates - 22 Dec 2023: llmware v0.1.13
-
Added 3 new vector databases - Postgres (PG Vector), Redis, and Qdrant
-
Improved support for integrating sentence transformers directly in the model catalog
-
Improvements in the model catalog attributes
-
Multiple new Examples in Models & Embeddings, including GGUF, Vector database, and model catalog
-
17 Dec 2023: llmware v0.1.12
- dragon-deci-7b added to catalog - RAG-finetuned model on high-performance new 7B model base from Deci
- New GGUFGenerativeModel class for easy integration of GGUF Models
- Adding prebuilt llama_cpp / ctransformer shared libraries for Mac M1, Mac x86, Linux x86 and Windows
- 3 DRAGON models packaged as Q4_K_M GGUF models for CPU laptop use (dragon-mistral-7b, dragon-llama-7b, dragon-yi-6b)
- 4 leading open source chat models added to default catalog with Q4_K_M
-
8 Dec 2023: llmware v0.1.11
- New fast start examples for high volume Document Ingestion and Embeddings with Milvus.
- New LLMWare 'Pop up' Inference Server model class and example script.
- New Invoice Processing example for RAG.
- Improved Windows stack management to support parsing larger documents.
- Enhancing debugging log output mode options for PDF and Office parsers.
-
30 Nov 2023: llmware v0.1.10
- Windows added as a supported operating system.
- Further enhancements to native code for stack management.
- Minor defect fixes.
-
24 Nov 2023: llmware v0.1.9
- Markdown (.md) files are now parsed and treated as text files.
- PDF and Office parser stack optimizations which should avoid the need to set ulimit -s.
- New llmware_models_fast_start.py example that allows discovery and selection of all llmware HuggingFace models.
- Native dependencies (shared libraries and dependencies) now included in repo to faciliate local development.
- Updates to the Status class to support PDF and Office document parsing status updates.
- Minor defect fixes including image block handling in library exports.
-
17 Nov 2023: llmware v0.1.8
- Enhanced generation performance by allowing each model to specific the trailing space parameter.
- Improved handling for eos_token_id for llama2 and mistral.
- Improved support for Hugging Face dynamic loading
- New examples with the new llmware DRAGON models.
-
14 Nov 2023: llmware v0.1.7
- Moved to Python Wheel package format for PyPi distribution to provide seamless installation of native dependencies on all supported platforms.
- ModelCatalog enhancements:
- OpenAI update to include newly announced βturboβ 4 and 3.5 models.
- Cohere embedding v3 update to include new Cohere embedding models.
- BLING models as out-of-the-box registered options in the catalog. They can be instantiated like any other model, even without the βhf=Trueβ flag.
- Ability to register new model names, within existing model classes, with the register method in ModelCatalog.
- Prompt enhancements:
- βevidence_metadataβ added to prompt_main output dictionaries allowing prompt_main responses to be plug into the evidence and fact-checking steps without modification.
- API key can now be passed directly in a prompt.load_model(model_name, api_key = β[my-api-key]β)
- LLMWareInference Server - Initial delivery:
- New Class for LLMWareModel which is a wrapper on a custom HF-style API-based model.
- LLMWareInferenceServer is a new class that can be instantiated on a remote (GPU) server to create a testing API-server that can be integrated into any Prompt workflow.
-
03 Nov 2023: llmware v0.1.6
- Updated packaging to require mongo-c-driver 1.24.4 to temporarily workaround segmentation fault with mongo-c-driver 1.25.
- Updates in python code needed in anticipation of future Windows support.
-
27 Oct 2023: llmware v0.1.5
- Four new example scripts focused on RAG workflows with small, fine-tuned instruct models that run on a laptop (
llmware
BLING models). - Expanded options for setting temperature inside a prompt class.
- Improvement in post processing of Hugging Face model generation.
- Streamlined loading of Hugging Face generative models into prompts.
- Initial delivery of a central status class: read/write of embedding status with a consistent interface for callers.
- Enhanced in-memory dictionary search support for multi-key queries.
- Removed trailing space in human-bot wrapping to improve generation quality in some fine-tuned models.
- Minor defect fixes, updated test scripts, and version update for Werkzeug to address dependency security alert.
- Four new example scripts focused on RAG workflows with small, fine-tuned instruct models that run on a laptop (
-
20 Oct 2023: llmware v0.1.4
- GPU support for Hugging Face models.
- Defect fixes and additional test scripts.
-
13 Oct 2023: llmware v0.1.3
- MongoDB Atlas Vector Search support.
- Support for authentication using a MongoDB connection string.
- Document summarization methods.
- Improvements in capturing the model context window automatically and passing changes in the expected output length.
- Dataset card and description with lookup by name.
- Processing time added to model inference usage dictionary.
- Additional test scripts, examples, and defect fixes.
-
06 Oct 2023: llmware v0.1.1
- Added test scripts to the github repository for regression testing.
- Minor defect fixes and version update of Pillow to address dependency security alert.
-
02 Oct 2023: llmware v0.1.0 π₯ Initial release of llmware to open source!! π₯