NLP Projects: Regression, Text Generation, and BERT

This repository contains a series of Jupyter notebooks for different NLP projects, focusing on text regression, text generation using transformers, and fine-tuning a BERT model.

Text Regression
Transformer (Text Generation)
BERT Model
Usage
Data Sources
Credits

Text Regression

Description

This project involves scraping text data from Arabic websites on a specific topic, preprocessing the text, and training various RNN-based models to predict a relevance score for each text.

Steps

Data Collection: Using Scrapy or BeautifulSoup to scrape text data from Arabic websites.
Dataset Preparation: Creating a dataset with two columns: Text (Arabic text) and Score (relevance score between 0 to 10).
NLP Pipeline: Preprocessing the text data (tokenization, stemming, lemmatization, removing stop words, discretization).
Model Training: Training RNN, Bidirectional RNN, GRU, and LSTM models with hyperparameter tuning.
Evaluation: Evaluating models using standard metrics and BLEU score.

Transformer (Text Generation)

Description

This project focuses on fine-tuning a pre-trained GPT-2 model using a custom dataset and generating new text based on a given sentence.

Steps

Installation: Installing pytorch-transformers.
Model Loading: Loading the pre-trained GPT-2 model.
Fine-Tuning: Fine-tuning the GPT-2 model on a custom dataset.
Text Generation: Generating new paragraphs based on a given input sentence.

Tutorial

Follow the tutorial here for detailed steps on fine-tuning GPT-2.

BERT Model

Description

This project involves using the pre-trained bert-base-uncased model for text classification tasks using a dataset from Amazon reviews.

Steps

Data Preparation: Downloading and preparing the dataset from Amazon Reviews.
Model Setup: Setting up the BERT embedding layer.
Fine-Tuning: Fine-tuning the BERT model with appropriate hyperparameters.
Evaluation: Evaluating the model using metrics like Accuracy, Loss, F1 score, BLEU score, and BERT-specific metrics.
Conclusion: Summarizing the performance and insights from using the pre-trained BERT model.

Data Sources

Text Regression: Scraped from various Arabic websites using Scrapy or BeautifulSoup.
Text Generation: Dataset
BERT Model: Amazon Reviews Dataset.

Credits

Project supervised by Pr. Elaachak Lotfi at Université Abdelmalek Essaadi, Faculté des Sciences et Techniques de Tanger, Département Génie Informatique.
Inspired by various tutorials and open-source projects in the NLP community.

OmarNouih/NLP_LAB4

NLP Projects: Regression, Text Generation, and BERT

Table of Contents

Text Regression

Description

Steps

Transformer (Text Generation)

Description

Steps

Tutorial

BERT Model

Description

Steps

Data Sources

Credits