/Text-Summarization-in-Arabic

This project was developed as part of the AIC Competition sponsored by MTC Egypt. The goal of the project is to create an efficient text summarization system for Arabic language text documents.

Primary LanguageJupyter Notebook

Arabic Text Summarization

License: MIT

This repository contains the code for an Arabic text summarization system. The system utilizes the AraBART model, based on the Transformers architecture, to generate summaries for Arabic text documents. The model is trained using a labeled dataset and is capable of both extractive and abstractive summarization.

Installation

To run the code, please follow the steps below:

  1. Clone the repository:
  2. git clone https://github.com/Geo-y20/Text-Summarization-in-Arabic.git
  3. Change the directory to the project folder:
  4. cd Text-Summarization-in-Arabic
  5. Install the required dependencies:
  6. pip install -r requirements.txt

Usage

Training

To train the summarization model, you need to provide a labeled dataset. The dataset should be in JSONL format, where each line represents a document with its corresponding summary. Modify the train.py file to load your labeled dataset and adjust the training parameters if necessary. Then, run the following command to start the training process:

python train.py

Inference

To generate summaries using the trained model, you can provide a separate validation dataset or test the model on your own text data. Modify the inference.py file to load your dataset and adjust the inference parameters as needed. Run the following command to generate summaries:

python inference.py

Directory Structure

The repository structure is organized as follows:

- data/
  - labeled_dataset.jsonl      # Labeled dataset for training
  - validation_dataset.jsonl   # Dataset for validation or testing
- models/
  - trained_model/             # Saved trained model
    - config.json
    - pytorch_model.bin
    - ...
- utils/
  - preprocessing.py           # Preprocessing utilities
  - evaluation.py              # Evaluation metrics
- train.py                     # Training script
- inference.py                 # Inference script
- requirements.txt             # Dependencies
- README.md                    # Project documentation
- LICENSE                      # License information

Contributing

Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

Our participation in A competition and finals

AIC-Competiontion

AIC-Competiontion

License

This project is licensed under the MIT License.

Contact

For any questions or inquiries, please feel free to reach out to the project maintainers:

  • George Youhana - g.ghaly0451@student.aast.edu
  • Mostafa Magdy - Mustafa.10770@stemredsea.moe.edu.eg
  • Abdallah Alkhouly- a.alkholy53@student.aast.edu
  • Ahmed Hafez- ahmedhafez20010701@gmail.com
  • Mahmoud Yasser- mahmoudyaser3110@gmail.com