/JLL_NLP

Primary LanguageJupyter Notebook

Spam Detector Project (NLP)

image

This repository contains a project to build a spam detector using Natural Language Processing (NLP) techniques. The project aims to classify URLs as spam or not spam based on the textual content of the URLs. The project is structured in four main steps: data upload, preprocessing, model building, and optimization

Step 1:

Upload Data The first step involves uploading a dataset containing URLs and their corresponding labels indicating whether the URL is spam or not. The dataset is loaded from a CSV file hosted on GitHub.

Step 2:

Preprocess the Links The second step focuses on preprocessing the data. This includes converting categorical variables to numerical, removing duplicates, and preprocessing the text within the URLs. Elimination of Repeated Values

Step 3:

Build an SVM The third step involves building a Support Vector Machine (SVM) model to classify the URLs. The dataset is vectorized using the TF-IDF method, and then split into training and test sets.

This project demonstrates the application of NLP techniques for spam detection, including data preprocessing, feature extraction, and model optimization.