/Automatic-Speech-Summarization-of-Mixed-Arabic-English-Speech

Many people don’t have time to hear a full audio, audio of two hours for example, to just find out where they are interested in the content of this audio. And here is where our project comes to life to solve this problem, by summarizing a full audio into a few lines of text.

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Automatic-Speech-Summarization-of-Mixed-Arabic-English-Speech

  • Due to the nature of our lives nowadays, and due to the fact that time has become more and more precious, many people don’t have time to hear a full audio, audio of two hours for example, to just find out where they are interested in the content of this audio. And here is where our project comes to life to solve this problem, by summarizing a full audio into a few lines of text.

  • This Repository is a demo where we deploy our machine learning models, which include a model speech recognition for mixed Arabic-English speech, a translation, and a summarization model. In this Project we used flask framework to create the demo as a web application.

Video:

Watch the demo video

Getting Started

To run the demo, walk with the following steps:

1- Clone this repository to your local machine, by typing this command in CMD or Terminal: `git clone https://github.com/ahmed-0egy/Automatic-Speech-Summarization-of-Mixed-Arabic-English-Speech`
2- You will need to download the summarization model from this link (https://drive.google.com/file/d/1pq8jGLnwRrDZAmI4yjhjGWQxfkbH4b0x/view?usp=share_link), and put it inside this directory: `.\static\assets\models\summarization`
3- Install the required packages, you can use the following command in CMD or Terminal:     `pip install requirements.txt`
4- Run the flask applicaion by running (app.py), you can use the following command in CMD or Terminal:    `flask run`
5- Now from cmd copy the url that shows up, and paste it in your browser

Notes

  • Note that on github the model.pt file may not exist, you don't need to worry about that since if it doesn't exist it will be downloaded automatically, once you run the application.
  • Also note that it may take sometime to run the application, that's totally OK, since the model.pt file is large and may take sometime to be loaded.
  • Finally note that if you don't have GPU and you are running on CPU this will take much time to execute since these models require parallel copmuting power.

Machine Learning & Deep Learning work

Project Structure

The project has the following file structure:

.
├── Static
│          └── assets
│                   └── models
│                             └── summarization
│                                       └── pytorch_model.bin
│                             └── transcription
│                                       └── medium.pt
│                   └── uploads
│          └── js
│                   └── script.js
│          └── styles
│                   └── scss
│                             └── photon-ai-dekstop-1440.css
│                   └── photon-ai-dekstop-1440.css
├── templates
│         └── base.html
│         └── index.html
├── app.py
├── speech.py
├── summarization.py
└── requirements.txt

Describtion about the Demo:

This demo provides the use with an interface that allows him to different tasks. In fact some of these tasks depend on other ones, however the demo allows the user to do any of these tasks without worrying about details, these tasks are:

1- Speech Recognition: This task will return you the transcription of the mixed Arabic-English Audio into English text.
2- English Title: This task will return you a short English description about the audio, which you can use as an English title for the audio.
3- English Title: This task will return you a short Arabic description about the audio, which you can use as an Arabic title for the audio.
4- English Summarization: This task will summarize the full mixed Arabic-English audio, into few lines of English text.
5- Arabic Summarization: This task will summarize the full mixed Arabic-English audio, into few lines of Arabic text.

All you need to do is to run the app and open the browser, and don't worry you will know what to do once you are there. The interface is very simple.

Website Appearance:

image image image image image image

Contributions

Contributions to "Automatic Speech Summarization of Mixed Arabic-English Speech" are welcome and encouraged! However, please note that this project is licensed under the GNU General Public License v3.0, which means that any contributions must also be licensed under the same terms.

By contributing to this project, you agree to license your contributions under the GNU General Public License v3.0. This means that your contributions may be used, modified, and distributed by anyone under the same terms as the original project.

If you would like to contribute to this project, please follow these guidelines:

1- Fork the repository.
2- Make your changes in a new branch.
3- Test your changes thoroughly.
4- Submit a pull request.

Thank you for considering contributing to this project!

Conclusion

1- We implemented pretrained language models, which were based upon the transformer architecture for 
the task of summarization. We concluded after tuning facebook/bart provided best results as 54.4% 
rouge1. So, we chose it as a champion model.
2- In the future,we are focusing on building more robust models,aim to fine-tune the pre-trained models of different  datasets and examine if the performance is improved.also, We would also like to expand the use case of our summarization task to more general long document summarization.
3- also we may extend the idea from speech to summarize the video.