Legal Case Analysis and Summarization System for Arabic 🔍

A comprehensive system for collecting, analyzing, and summarizing legal cases in Arabic using Large Language Models fine tuning and knowledge graph techniques.

📋 Table of Contents

Overview
System Architecture
Key Features
Prerequisites
Installation
Usage Pipeline
Project Structure
Performance
Created Knowledge Graph

🔍 Overview

This project provides an end-to-end solution for processing legal cases, from data collection to serving structured summaries via an API. The system uses state-of-the-art language models, specifically a fine-tuned LLaMA model, to generate comprehensive case summaries and builds a knowledge graph for advanced legal analysis.

🤗 Hugging Face Model

The fine-tuned model is publicly available on the Hugging Face Hub: ahmadsakor/Llama3.2-3B-Instruct-Legal-Summarization

🏗️ System Architecture

The system consists of five main components:

Data Collection & Preparation: Automated crawling and OCR processing of legal documents
Model Fine-tuning: Custom LLaMA model adaptation for legal summarization
Evaluation Framework: Comprehensive metrics for model assessment
Knowledge Graph: RDF-based graph database for legal case analysis
Inference API: FastAPI service for generating case summaries

✨ Key Features

Automated legal case collection from the Moroccan Judicial Portal
Advanced OCR processing with error correction
Fine-tuned LLaMA model for Arabic legal text
Comprehensive evaluation framework
RDF-based knowledge graph
FastAPI-based inference service
Multi-language support (Arabic/English)

💻 Prerequisites

Python 3.10 or higher
CUDA-capable GPU (12GB+ VRAM recommended)
32GB+ RAM recommended
50GB+ storage space
Internet connection
GraphDB instance (for knowledge graph)

🚀 Installation

Each component has its own dependencies. To set up the complete system:

Clone the repository:

git clone [repository-url]
cd [repository-name]

Follow the instructions in each directory.

📝 Usage Pipeline

Data Collection & Preparation
Model Fine-tuning
Model Evaluation
Knowledge Graph Creation
Deploy Inference API

📁 Project Structure

├── fetch_data/          # Data collection and processing
├── fine_tuning/         # Model training and adaptation
├── evaluation/          # Performance assessment
├── knowledge_graph/     # Graph database creation
├── inference/           # API service
├── data/               # Data storage
└── logs/               # System logs

📈 Performance

The fine-tuned model shows significant improvements over the base model:

Court Information: +4.58% BERTScore F1
Legal Principles: +10.48% BERTScore F1
Final Decision: +9.27% BERTScore F1
JSON Validity: 99% accuracy

Full evaluation metrics available in the evaluation directory.

🌐 Created Knowledge Graph

Below is a sample of the created knowledge graph:

🙏 Acknowledgments

Moroccan Judicial Portal for providing access to legal cases
Contributors and maintainers of the OCR libraries used in this project
Contributors and maintainers of 🤗 Transformers
Contributors and maintainers of DeepSpeed
Contributors and maintainers of PEFT (Parameter-Efficient Fine-Tuning)
Contributors and maintainers of Weights & Biases for experiment tracking
Contributors and maintainers of Moroccan Judicial Portal for providing access to legal cases
Contributors and maintainers of Meta AI for the base LLaMA model.

AhmadSakor/LegalTextSummarization