mt-llm-eval

Hypothesis: The Flash Cards Generated by a LLM will be more relevant to the user then the Flash Cards generated by a Domain Expert.

Introduction

We will measure this by measuring the Lexical metrics created Flash cards to the source material. We will also measure the relevancy of the Flash Cards to the user.

List of the Lexical Metrics:

Bleu
Rouge
F1
Bert Score
Bart Score
DoXpy

Methodology

We will use the following methodology to measure the relevancy of the Flash Cards to the source material.

We will create a set of Flash Cards using a LLM. a. We will extract the text from the source material. b. We will create a set of Flash Cards using a LLM. c. We will measure the Lexical Metrics of the Flash Cards to the source material.
We will create a set of Flash Cards using a Domain Expert.

Creating the Flash Cards using a LLM

Installation & Environment Setup

If you are using VSCode, you can use the devcontainer to install the relevant python version.

Create a virtual environment

python -m venv .venv
source .venv/bin/activate

Installation

pip install -r requirements.txt

Usage

To run the program, use the following command:

python main.py