/Structural-Protein-Sequences

I did data preprocessing with my team.We have a good time with this project.

Primary LanguageJupyter Notebook

Structural-Protein-Sequences

Project:

In this project, we examined the structural properties of proteins. The project contains various data preprocessing techniques to analyze and clean a dataset.

About The Dataset:

The dataset is about structural protein sequences that has been retrieved from Research Colloboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) and kindly aknowledged. The dataset contains protein meta data which includes details on protein classification, extraction methods, etc.

Content:

  1. Imports and Functions
  2. Exploratory Data Analyses
  3. Data Visualization
    3.1. Boxplots
    3.2. Nullity Matrice
    3.3. Heatmap
  4. Split Data Into Training and Testing
  5. Outliers Detection
    5.1 IQR method
  6. Missing Value Imputation
    6.1. Mean
    6.2. Median
    6.3. Lineer Regression
    6.4. KNN
    6.5. Random Forest

Contributors.

Pınar Kaya
Hanım Halilova