/Data-Science-Fundamentals

Data Science

Primary LanguageJupyter NotebookMIT LicenseMIT

Data-Science-Fundamentals

Data Science Fundamentals Prepared by Taha Er

1. Data Exploration and Understanding (Exploratory Data Analysis - EDA)

  • Get to know the dataset
  • Summary statistics (mean, median, std, etc.)
  • Data distribution and visualization (histogram, boxplot, scatter plot, etc.)

2. Data Cleaning

  • Identify and impute missing values
  • Detect and handle outliers
  • Correct erroneous or inconsistent data

3. Feature Selection and Transformations

  • Remove unnecessary or redundant features
  • Apply transformations such as log, ln, 1/x, etc.
  • Scaling (StandardScaler, MinMaxScaler, etc.)

4. Encoding Categorical Features

  • Label Encoding
  • One-Hot Encoding
  • Target Encoding

5. Feature Engineering

  • Create new features (feature generation)
  • Create feature interactions
  • Generate time-based features for time series data

6. Feature Selection

  • Feature importance scores
  • Recursive Feature Elimination (RFE)
  • Principal Component Analysis (PCA)

7. Model Preparation

  • Create training and test sets (train-test split)
  • Apply data augmentation if necessary
  • Balance the dataset (using methods like SMOTE)

Key Points to Note:

  • Imputing missing values and encoding categorical features are steps that can significantly impact model performance.
  • Applying transformations can help normalize data distributions, aiding better model learning.
  • It is beneficial to re-explore the data after all these steps to ensure it is ready for modeling.