AFAAN OROMOO NEWS CLASSIFICATION SYSTEM is a machine learning project designed to classify news articles written in Afaan Oromoo, the Oromo language. The primary goal of this project is to leverage natural language processing (NLP) techniques to automatically categorize news content into various predefined categories.
- Language Support: Afaan Oromoo is a widely spoken language, and this project aims to enhance accessibility to news articles in this language.
- Multiclass Classification: The system is capable of classifying news articles into multiple categories, providing a comprehensive understanding of the content.
- Rule-Based Stemmer: The project incorporates a rule-based stemmer based on the hybrid approach described in the paper "Designing a Stemmer for Afaan Oromo Text: A Hybrid Approach" by Debela Tesfaye.
The motivation behind this project is to contribute to the enhancement of information accessibility for Afaan Oromoo speakers. By automating the categorization of news articles, individuals can easily navigate through relevant content based on their interests.
This project is intended for individuals interested in Afaan Oromoo language and those seeking to implement or understand machine learning techniques for natural language processing.
The project is organized into distinct directories, each serving a specific purpose.
📦 afaan-oromoo-news-classification-system
┣ 📂api
┃ ┣ 📜app.py
┃ ┣ 📜model_loader.py
┣ 📂data
┣ 📂docs
┃ ┣ 📜model_documentation.pdf
┣ 📂logs
┣ 📂models
┃ ┣ 📂trained_models
┃ ┃ ┣ 📜label_encoder.joblib
┃ ┃ ┣ 📜model_20231116_accuracy_0.9263.h5
┃ ┃ ┗ 📜tokenizer.joblib
┃ ┣ 📜train_model.ipynb
┣ 📂preprocessing
┃ ┣ 📜preprocessing_pipeline.py
┃ ┣ 📜special_character_handler.py
┃ ┣ 📜stemmer.py
┃ ┣ 📜stopword_remover.py
┃ ┣ 📜tokenizer.py
┣ 📂scrapers
┃ ┣ 📜fbc_scraper.py
┣ 📂tests
┃ ┣ 📜test_preprocessing.py
┗ 📜requirements.txt
This structure helps maintain code separation and modularity. Key folders include api
for the Flask API, preprocessing
for text processing modules, docs
for documentations, models
for training and storing trained models, scrappers
for web scrapping scripts and tests
for unit tests.
To set up the project locally, follow these steps:
- Clone the Repository:
git clone https://github.com/abdulmunimjemal/afaan-oromoo-news-classification-system.git cd afaan-oromoo-news-classification-system
- Install Dependencies:
pip install -r requirements.txt
-
Clone the Repository:
cd api python app.py
The API should now be accessible at http://localhost:5000/predict/.
-
Test the API:
curl -X POST -H "Content-Type: application/json" -d '{"text": "Your sample afaan oromootext here."}' http://localhost:5000/predict
Adjust the payload and URL based on your API endpoints. Now, the project should be set up locally, and you can begin exploring and using the features.
The project relies on the following external dependencies:
- Pandas: A powerful data manipulation and analysis library for working with structured data.
- Flask: A lightweight web application framework for the API.
- scikit-learn: A machine learning library for model training and evaluation.
- TensorFlow: An open-source machine learning framework used for building and training neural networks.
Make sure to install these dependencies using the provided requirements.txt
file:
pip install -r requirements.txt
We welcome contributions from the community. If you'd like to contribute, please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature/your-feature
) - Make your changes
- Commit your changes (
git commit -m 'Add your feature'
) - Push to the branch (
git push origin feature/your-feature
) - Open a pull request
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit here or see the LICENSE.md
file.
License Summary: This project is intended for educational and non-commercial purposes only. Users are required to provide proper attribution to the authors and share any derivative works under the same license terms.
A: No, this project is intended for educational purposes only.
A: When using this project, please provide proper attribution to the authors.