LLM Attribution

This project contains code for testing BM25-related stuff and the best embeddigng settings used for llm attribution.

Demo

A demo can be found here 🤗. Just enter some text into the query to get the documents it probably came from (right now it only contains some wikipedia articles).

Some Notable Files

bm25_init.py and bm25_intro contains code for the implementation and testing of BM25.
embedding_examples folder contains some examples of the embeddings of different sentence transformer models used for testing (the rest is stored elsewher).
test_data contains all the test data from Wikipedia, TreCovid, and NFCorpus.
generate_embeddings - creates the different embeddings to compare using some given model from Huggingface.
dim_reduce - tests dimensionlaity reduction using PCA on some embeddings to see how accuracy changes
results - contains the pkl files for all the different embeddings and settings experimented.
analyssis- contains all the analysis files for the results above.

xinchen-yang/cmsc673

LLM Attribution

Demo

Some Notable Files