llm-validation: A TypeScript repository from saflamini

LLM Validation With Typescript & Python

This folder will show you how to make use of chain of verification and embeddings to reduce the likelihood of hallucinations in text generated by LeMUR. We want to ensure that LeMUR has a clear grounding in the underlying transcript.

Using Embeddings to Filter Out Potential Hallucinations - embeddingCitations.ts and embeddings_citations.py

This script does the following:

convert transcript to sentences, and embed each sentence
convert summary generated by lemur into sentences, then embed each sentence
for each sentence from the summary, conduct a similarity search against the original transcript's sentences
if each sentence from the summary has at least one sentence that has a similarity score that is at least as high as a specified "threshold" value, then keep that sentence in the summary, otherwise, filter it out.

The goal of this method is to identify sentences which are not grounded within the underyling transcript. You can also use this method as the basis for 'grounded generation' by storing a record of the most similar sentences for each sentence generated by LeMUR.

Chain of Verification Implementation - cov.ts and cov.py

This script follows the method laid out in this paper: https://arxiv.org/abs/2309.11495#:~:text=We%20develop%20the%20Chain%2Dof,generates%20its%20final%20verified%20response

Combining the Chain of Verification Method and Embeddings to Filter Out Hallucinations - covAndEmbeddings.ts

This script combines the embedding citations method and the chain of verification implementation.

Step 1) Follow the chain of verification process until we get the final output from step 4 in the COV process.

Step 2) Apply the embedding citations method to filter out any sentences which don't have proper grounding in the final output from the COV process.

Future Work

NEXT STEPS:

Create additional scripts designed to work directly with LeMUR's Q&A endpoint
Create example for generating 'grounded' outputs in json format with LeMUR's Q&A endpoint. I.e. response to question, most similar sentence stored together
Usage of TypeChat from Microsoft for further validation
Eval suite to track performance of various methods & models

Running the project in node

Cd into the node dir: cd node

Step 1 - install dependencies: npm install

Step 2 - run the project with npm start npm start

Step 3 - run each script