Tamil Text Checker

A comprehensive tool for checking Tamil text spelling, grammar, and overall correctness using multiple AI models.

Features

Multiple model support:
- Rule-based checking
- Deep Learning analysis (using Indic-BERT)
- Statistical analysis
- Google Gemma integration
Real-time error detection
Spelling and grammar correction
Confidence scores for each model
User-friendly Streamlit interface

Screenshots

Main Interface

Example Selection

Analysis Results

Installation

Clone the repository:

git clone https://github.com/arsath-eng/Tamil-spell-checker.git
cd Tamil-spell-checker

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env
# Edit .env with your GROQ_API_KEY

Usage

Start the Streamlit application:

streamlit run main.py

Access the web interface at http://localhost:8501
Choose your input method:
- Enter custom text
- Use example text
Select checking options:
- Spelling check
- Grammar check
Click "Check Text" to analyze

Model Details

Rule-Based Model

Uses predefined Tamil dictionary and grammar rules
Fast and deterministic results
Best for basic spelling and grammar checks

Deep Learning Model

Powered by AI4Bharat's Indic-BERT
Handles complex language patterns
Suitable for nuanced grammar analysis

Statistical Model

Uses TF-IDF and Naive Bayes classification
Trained on correct Tamil text samples
Good for identifying unusual patterns

Google Gemma Integration

Leverages Groq's Gemma 2B model
Provides detailed language insights
Offers correction suggestions

Example Use Cases

Basic Spelling Check:

Input: நான் பள்ளிக்கு சல்கிறேன்
Output: Spelling error detected in "சல்கிறேன்" - Suggested: "செல்கிறேன்"

Grammar Check:

Input: நான் பள்ளிக்கு செல்கிறார்கள்
Output: Grammar error - Subject-verb agreement mismatch

Project Structure

tamil-text-checker/
├── main.py
├── requirements.txt
├── models/
│   ├── __init__.py
│   ├── rule_based_model.py
│   ├── deep_learning_model.py
│   ├── statistical_model.py
│   └── google_gemma_model.py
├── data/
│   └── tamil_words.txt
└── .env

Dependencies

Main dependencies include:

streamlit >= 1.24.0
indic-nlp-library >= 0.91
transformers >= 4.30.2
torch >= 2.2.0
tensorflow >= 2.13.0
scikit-learn >= 1.2.2
Groq API client
python-dotenv

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Acknowledgments

AI4Bharat for the Indic-BERT model
Indic NLP Library contributors
Groq for the Gemma model API
Tamil language experts who helped validate the rule sets