Mistral Finetuning Hackathon 2024

For instructions on running the solution, click here.

Alplex: An AI-based Virtual Law Office

Introducing Alplex, an AI-powered virtual law office designed to assist you with legal issues based on Swiss laws.

Key Features

  1. AI Legal Assistant - Dona:

    • Clarification & Summarization: Receive your case and help summarize it.
    • Technology: Powered by an Autogen Conversable Agent and a fine-tuned Mistral 7B model.
  2. AI Paralegal - Rachel:

    • Case Classification: Classifies your case into the correct legal category.
    • RAG over Swiss Laws: Uses a large Mistral model to perform Retrieval-Augmented Generation over relevant Swiss laws.

Application Interface

Application Interface

Fine-tuning with Mistral API

We leveraged the Mistral fine-tuning API for two critical aspects:

  1. Improving Dona: Enhanced guardrails and distilled from larger models (notebooks/04_dona_finetuning.ipynb)
  2. Better Case Classification: Optimized classification accuracy for legal cases. (notebooks/05_classification_finetuning.ipynb)

Solution Diagram

Solution Diagram

Finetuning Usage

Fine-tuning for Dona

Goals

  1. Robust Client Interaction:

    • Good resilience against prompt hacking.
    • Created a dataset with a mix of legitimate replies and placeholders for prompt hacking scenarios.
  2. Enhanced Responses:

    • Distilled from larger models to improve response quality.
    • Used GPT-4o outputs to inspire the Mistral 7B model for better summaries.
  3. Cost and Performance Efficiency:

    • Autogen agent requiring multiple interactions.
    • Fine-tuned smaller model for efficiency and scalability.

Fine-tuning Results

Fine-tuning for Classification

We prepared a dataset of legal cases categorized under Civil, Public, or Criminal law and evaluated various models:

  1. Baseline: Traditional ML (TFIDF+LGBM).
  2. Mistral 7B: Prompting only.
  3. Mistral 7B (Fine-tuned): Significant performance improvement, reduced hallucinations.

Classification Results (Fold 0 of Stratified 5-Fold CV)

  • TFIDF+LGBM: Accuracy 0.86
  • Mistral 7B: Accuracy 0.55
  • Mistral 7B (Fine-tuned): Accuracy 0.71

Limitations

  • Supports only Swiss Federal Laws.
  • Handles only Civil, Public, or Criminal law cases.
  • Case classification could be improved (class imbalance).
  • The agentic RAG (Rachel) could make several iteration to improve the final answer.

How to Run

git clone git@github.com:unit8co/mistral-hackathon-finetuning.git
cd mistral-hackathon-finetuning

# Ensure you have Python 3.11+ and Node.js + npm (tested with Node v22.1.0, npm 10.7.0) for the frontend.

# Install necessary assets:
# download chroma.zip at https://mistral-finetuning-hackathon-2024.s3.eu-central-1.amazonaws.com/chroma.zip
# move it into the root of the repository
# unzip it in the root of the repo

# Create a virtual environment
python -m venv .venv

# Install dependencies
pip install -r requirements.txt

# Create a .env file and enter your Mistral API key
cp .env.template .env

# Start the backend
PYTHONPATH=$(pwd) python src/backend/main.py

# In another terminal, navigate to the frontend folder and run the frontend
cd src/frontend
# Install Node.js dependencies
npm install
# Run the frontend
npm run dev

# Follow the localhost URL displayed to start interacting with Dona and Rachel.