llmware

llmware is a unified framework for developing LLM-based application patterns including Retrieval Augmented Generation (RAG). This project provides an integrated set of tools that anyone can use – from beginner to the most sophisticated AI developer – to rapidly build industrial-grade, knowledge-based enterprise LLM applications with specific focus on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely to LLMs in private cloud.

🎯 Key features

llmware is an integrated framework comprised of four major components:

Retrieval: Assemble and Query knowledge base

High-performance document parsers to rapidly ingest, text chunk and ingest common document types.
Comprehensive intuitive querying methods: semantic, text, and hybrid retrieval with integrated metadata.
Ranking and filtering strategies to enable semantic search and rapid retrieval of information.
Web scrapers, Wikipedia integration, and Yahoo Finance API integration.

Prompt: Simple, Unified Abstraction across 50+ Models

Connect Models: Simple high-level interface with support for 50+ models out of the box.
Prompts with Sources: Powerful abstraction to easily package a wide range of materials into prompts.
Post Processing: tools for evidence verification, classification of a response, and fact-checking.
Human in the Loop: Ability to enable user ratings, feedback, and corrections of AI responses.
Auditability: A flexible state mechanism to analyze and audit the LLM prompt lifecycle.

Vector Embeddings: swappable embedding models and vector databases

Industry Bert: out-of-the-box industry finetuned open source Sentence Transformers.
Wide Model Support: Custom trained HuggingFace, sentence transformer embedding models and leading commercial models.
Mix-and-match among multiple options to find the right solution for any particular application.
Out-of-the-box support for 7 vector databases - Milvus, Postgres (PG Vector), Redis, FAISS, Qdrant, Pinecone and Mongo Atlas.

Parsing and Text Chunking: Scalable Ingestion

Integrated High-Speed Parsers for: PDF, PowerPoint, Word, Excel, HTML, Text, WAV, AWS Transcribe transcripts.
Text-chunking tools to separate information and associated metadata to a consistent block format.

📚 Explore additional llmware capabilities and 🎬 Check out these videos on how to quickly get started with RAG:

🌱 Getting Started

1. Install llmware:

pip install llmware

python3 -m pip install llmware

See Working with llmware for other options to get up and running.

2. MongoDB and Milvus

MongoDB and Milvus are optional and used to provide production-grade database and vector embedding capabilities. The fastest way to get started is to use the provided Docker Compose file (note: requires Docker Compose / Docker desktop to be installed) which takes care of running them both:

curl -o docker-compose.yaml https://raw.githubusercontent.com/llmware-ai/llmware/main/docker-compose.yaml

and then run the containers:

docker compose up -d

Not ready to install MongoDB or Milvus? Check out what you can do without them in our examples section.

See Running MongoDB and Milvus for other options to get up and running with these optional dependencies.

3. 🔥 Start coding - Quick Start for RAG 🔥

# This example illustrates a simple contract analysis
# using a small RAG-optimized LLM running locally

import os
import re
from llmware.prompts import Prompt, HumanInTheLoop
from llmware.setup import Setup
from llmware.configs import LLMWareConfig

def contract_analysis_on_laptop (model_name):

    # Load the llmware sample files
    print (f"\n > Loading the llmware sample files...")
    sample_files_path = Setup().load_sample_files()
    contracts_path = os.path.join(sample_files_path,"Agreements")
 
    # query list
    query_list = {"executive employment agreement": "What are the name of the two parties?",
                  "base salary": "What is the executive's base salary?",
                  "governing law": "What is the governing law?"}

    print (f"\n > Loading model {model_name}...")

    prompter = Prompt().load_model(model_name)

    for i, contract in enumerate(os.listdir(contracts_path)):

        #   excluding Mac file artifact
        if contract != ".DS_Store":

            print("\nAnalyzing contract: ", str(i+1), contract)

            print("LLM Responses:")
            for key, value in query_list.items():

                # contract is parsed, text-chunked, and then filtered by topic key
                source = prompter.add_source_document(contracts_path, contract, query=key)

                # calling the LLM with 'source' information from the contract automatically packaged into the prompt
                responses = prompter.prompt_with_source(value, prompt_name="just_the_facts", temperature=0.3)

                for r, response in enumerate(responses):
                    print(key, ":", re.sub("[\n]"," ", response["llm_response"]).strip())

                # We're done with this contract, clear the source from the prompt
                prompter.clear_source_materials()

    # Save jsonl report to jsonl to /prompt_history folder
    print("\nPrompt state saved at: ", os.path.join(LLMWareConfig.get_prompt_path(),prompter.prompt_id))
    prompter.save_state()

    # Save csv report that includes the model, response, prompt, and evidence for human-in-the-loop review
    csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()
    print("csv output saved at:  ", csv_output)


if __name__ == "__main__":

    # use local cpu model - smallest, fastest (use larger BLING models for higher accuracy)
    model = "llmware/bling-1b-0.1"

    contract_analysis_on_laptop(model)

📚 See 50+ llmware examples for more RAG examples and other code samples and ideas.

4. Accessing LLMs and setting-up API keys & secrets

To use LLMWare, you do not need to use any proprietary LLM - we would encourage you to experiment with BLING, DRAGON, Industry-BERT, the GGUF examples, along with bringing in your favorite models from HuggingFace and Sentence Transformers.

If you would like to use a proprietary model, you will need to provide your own API Keys. API keys and secrets for models, aws, and pinecone can be set-up for use in environment variables or passed directly to method calls.

🔹 Alternate options for running MongoDB and Milvus

There are several options for getting MongoDB running

🐳 A. Run mongo container with docker

docker run -d -p 27017:27017  -v mongodb-volume:/data/db --name=mongodb mongo:latest

🐳 B. Run container with docker compose

Create a docker-compose.yaml file with the content:

version: "3"

services:
  mongodb:
    container_name: mongodb
    image: 'mongo:latest'
    volumes:
      - mongodb-volume:/data/db
    ports:
      - '27017:27017'

volumes:
    llmware-mongodb:
      driver: local

and then run:

docker compose up

📖 C. Install MongoDB natively

See the Official MongoDB Installation Guide

🔗 D. Connect to an existing MongoDB deployment

You can connect to an existing MongoDB deployment by setting the connection string to the environment variable, COLLECTION_DB_URI. See the example script, Using Mongo Atlas, for detailed information on how to use Mongo Atlas as the NoSQL and/or Vector Database for llmware.

Additional information on finding and formatting connection strings can be found in the MongoDB Connection Strings Documentation.

✍️ Working with the llmware Github repository

The llmware repo can be pulled locally to get access to all the examples, or to work directly with the latest version of the llmware code.

Pull the repo locally

git clone git@github.com:llmware-ai/llmware.git

or download/extract a zip of the llmware repository

Run llmware natively

Update the local copy of the repository:

git pull

Download the shared llmware native libraries and dependencies by running the load_native_libraries.sh script. This pulls the right wheel for your platform and extracts the llmware native libraries and dependencies into the proper place in the local repository.

./scripts/dev/load_native_libraries.sh

At the top level of the llmware repository run the following command:

pip install .

✨ Roadmap - Getting help and sharing your ideas with the community

Questions and discussions are welcome in our github discussions.

Interested in contributing to llmware? We welcome collaboration. Our roadmap is focused primarily on the following areas:

💡 Making it easy to deploy fine-tuned open source models to build state-of-the-art RAG workflows
💡 Private cloud - keeping documents, data pipelines, data stores, and models safe and secure
💡 Model quantization, especially GGUF, and democratizing the game-changing use of 7B CPU-based LLMs
💡 Developing small specialized RAG optimized LLMs between 1B-7B parameters
💡 Industry-specific LLMs, embedding models and processes to support core knowledge-based use cases
💡 Enterprise scalability - containerization, worker deployments and Kubernetes
💡 Integration of SQL and other scale enterprise data sources

Like our models, we aspire for llmware to be "small, but mighty" - easy to use and get started, but packing a powerful punch!

Information on ways to participate can be found in our Contributors Guide. As with all aspects of this project, contributing is governed by our Code of Conduct.

📣 Release notes and Change Log

Latest Updates - 30 Dec 2023: llmware v0.1.14

Added support for Open Chat inference servers (compatible with OpenAI API)
Improved capabilities for multiple embedding models and vector DB configurations
Added docker-compose install scripts for PGVector and Redis vector databases
Added 'bling-tiny-llama' to model catalog

Latest Updates - 22 Dec 2023: llmware v0.1.13

Added 3 new vector databases - Postgres (PG Vector), Redis, and Qdrant
Improved support for integrating sentence transformers directly in the model catalog
Improvements in the model catalog attributes, including discovery and customization
Multiple new Examples in Models & Embeddings, including GGUF, Vector database, and model catalog

Latest Updates - 17 Dec 2023: llmware v0.1.12

dragon-deci-7b added to catalog - RAG-finetuned model on high-performance new 7B model base from Deci
New GGUFGenerativeModel class for easy integration of GGUF Models
Adding prebuilt llama_cpp / ctransformer shared libraries for 'out of the box' use on Mac M1, Mac x86, Linux x86 and Windows
3 DRAGON models packaged as Q4_K_M GGUF models for CPU laptop use (dragon-mistral-7b, dragon-llama-7b, dragon-yi-6b)
4 leading open source chat models added to default catalog with Q4_K_M with support for specific chat prompt wrappers

Supported Operating Systems:

MacOS
Linux
Windows

Supported Vector Databases:

Milvus
Postgres (PG Vector)
Redis
FAISS
Pinecone
MongoDB Atlas Vector Search
Qdrant

Prereqs:

All Platforms: Python v3.9 - 3.11
To enable the OCR parsing capabilities, install Tesseract v5.3.3 and Poppler v23.10.0 native packages.

Optional:

Docker

🚧 Change Log