Machine_learning_workflow

Collection of scripts and notebooks useful when conducting a project

Part 1: Preprocessing I. EDA (Exploratory Data Analysis)

  • Pandas profiling
  • Map columns

II. Cleaning

  • Get unique values and replace them with np.nan, numeric
  • Get column names
  • Set correct schema/datatypes. Get rid of invalid characters that might be problematic
  • Get rid of "code" values such as 999

III. Imputation

IV. Aggregation

  • Sum across different columns that have certain value

Part 2: Implementation I. Machine learning algorithms II. mlflow tracking III. Econometric/Statistical models IV. Hyperparameter tuning loop V. Metrics