Fine-Tuning mBart for Bengali Sentence Error Correction

Overview

Fine-tuning the mBart model to correct grammatical and syntactical errors in Bengali sentences. The aim is to achieve high accuracy in transforming incorrect sentences into their correct form. The full training process can be found here: Notebook Here is a live HuggingFace Demo of the finetune model in action.

Base Model

Model: mBART Large 50
- Description: A multilingual transformer model capable of understanding and generating 50 different languages.

Dataset

Source: BNSEC Data Repository
- Size: 1.3 Million sentence pairs
- Characteristics: Contains a mix of correct and grammatically incorrect Bengali sentences.

Training Specifications

Optimizer: AdamW
Learning Rate: 0.00001
Batch Size: 128
Training Steps: 6500
GPU: Google Colab A100
Epochs: 1
Max Token Length: 32

Model Performance Metrics