/Data_Processing_and_Feature_Engineering_in_Machine_Learning

This is an attempt to summarize feature engineering methods that I have learned over the course of my graduate school.

Primary LanguageJupyter Notebook

Data Cleaning, Feature Engineering, and Dimensionality Reduction in Machine Learning

This is a project to showcase different data cleaning, feature preprocessing, and feature selection in machine learning. Each jupyter notebook itself is a standalone illustration of the technique covered in that notebook.

Dependencies

This project requires python and the following python libraries.

  1. pandas
  2. numpy
  3. seaborn
  4. matplotlib
  5. scikit-learn

It also requires a software that can open and execute a Jupyter Notebook.

Installation

  1. Clone the repo.
  2. Download the necessary data from the Data section below for the required technoque.
  3. Navigate to the top-level project directory that contains this readme file.
  4. Go to Source_Codes directory.
  5. Run the following command:
        jupyter notebook
     
    
  6. This will open a tab on a web browser.
  7. Click on the file for the dimensionality reduction technque that you are interested in.

Methods

  1. Missing Values Imputation Techniques
  2. Handling Categorical Data
  3. Zero-Variance Feature Removal
  4. Multicollinearity Removal
  5. Tokenization, Stemming, and Lemmatization
  6. Forward Elimination/ Bakward Elimination/ Stepwise Elimination