/Email-Spam-Detection-Logistic-Regression

This model can predict whether an email is spam or not. The logistic regression machine learning algorithm is used to train this model.

Primary LanguageJupyter Notebook

Email Spam Detection Using Logistic Regression

Most of the logistic regression theory is covered in this project.

Using these theories, We can identify whether emails are spam or not


Introduction

  • When we train the machine learning model we need to follow several steps.

  • While training this linear regression machine learning algorithm, we need to follow some steps to make the model accurate and fast. Amoung them are things like,

    Data Collecting
    Data Preprocessing
    Data Analysis
    Split the Data into training and testing
    Evaluate the Model
    Check model performance
    Fine-tune the Model
  • A better understanding of these can be obtained from the following introduction and relative code sections related to the introduction can be obtained by observing the code.


Data Collecting

  • We must collect the data we need according to our needs.
  • Depending on the target variable (dependent variable/ our predictor variable/ y) we need to collect other data (characteristics/ independent variables/ Features).

Data Preprocessing

  • After collecting the data we need to clean it,

    find missing data and fill them
    Drop duplicate data
    Turn categorical data into numerical or Boolean
    Rename columns for easily understand
    Separate target value and features
  • We can use Encoding method or dummy method for convert categorical data into numerical or boolean.

Data Analysis

  • We can analyze the relationships between the target and the features using plots, graphs, etc.
  • We can identify the relationship through the following sample examples.