Machine Intelligence using OpenAI, Semantic Kernel, Vector Search, SQL Server
TSQLMIT
MachineIntelligence-TextAnalytics-TPLDataFlows
Machine Intelligence Text Analytics Enrichment implemented using Task Parallel Library Data Flow Pipelines:
Document Enrichment Pipeline - Builds the entire Vector Database using OpenAI embeddings in SQL using 50 selected books
Q&A Over Vector Database Pipeline - Searches the SQL Vector Database with provided question phrase using Semantic Kernel
Total Text (OpenAI) Tokens Processed:...............8,267,408
Total Text (Characters) Length Processed:..........33,702,085
Total cost for processing and building Vector Database using OpenAI Embeddings (Feb 2024 prices):
text-embedding-ada-002 with 1536 dimensions: ~$0.84 (~84 cents; this depends on how the chunking of text is configured)
text-embedding-3-small with 512 dimensions: ~$0.17 (~17 cents; this depends on how the chunking of text is configured)
Features:
The console app uses 50 selected books from the Project Gutenberg site from various authors: Oscar Wilde, Bram Stoker, Edgar Allen Poe, Alexandre Dumas and performs enrichment using multiple AI enrichment steps
Downloads book text, processes text analytics & embeddings, creates a vector database in SQL, demonstrates vector search and answers a sample question using semantic meaning from OpenAI embeddings
Stores all enrichment output for each book in a seperate JSON file
Rather than processing text analytics enrichment in single synchronous steps, it uses an data flow model to create efficient pipelines that can saturate multiple logical CPU cores
Illustrates that SQL Server or Azure SQL can be used as a valid Vector Store, can perform vector search and provide Q&A over the database
Demonstrates how to create a Machine Intelligence & Text Analytics Pipeline can be combbined using TPL DataFlows
The console application is cross-platform .NET 8.x. It will run on macOS, Linux, Windows 10/11 x64, Windows 11 ARM
Requirements:
Visual Studio 2022, .NET 8.x
SQL Server Connection to either a local SQL Server 2022 (free Devolpment SKU or higher) or Azure SQL Database
******Note: SQL Server 2022 / Azure SQL Database features are used for JSON processing and ordered Columnstore Indexes
OpenAI for both embeddings and completions
Getting Started - Step 1) Configuration of SQL Connection and OpenAI API Keys (example of secrets.json shown below)
Ensure to add .NET Secrets or JSON configuration (you will need to add the JSON code if using a file)
Right-click on the C# Project and select "Manage User Secrets"
Add the SQL Connection (SQLConnection) and OpenAI (APIKey) (if using Azure OpenAPI, use AzureOpenAPI section)
Getting Started - Step 2) Processing (after adding proper SQL and OpenAI/Azure OpenAI connections):
Select option 1 to process the entire Data Enrichment Pipeline (build the embeddings Vector Database in SQL)
Select option 2 to only process the Q&A pipeline using Semantic Kernel over the Vector Database (Note: Option #1 must have been run beforehand)
Select option 3 to only process the Q&A pipeline with reasoning using Semantic Kernel over the Vector Database (Note: Option #1 must have been run beforehand). This option is similar to option #2 except it provides details on how the AI agent achieved the results.
Learn more about the concepts used in this repository: