Collection of scripts and notebooks useful when conducting a project
Part 1: Preprocessing I. EDA (Exploratory Data Analysis)
- Pandas profiling
- Map columns
II. Cleaning
- Get unique values and replace them with np.nan, numeric
- Get column names
- Set correct schema/datatypes. Get rid of invalid characters that might be problematic
- Get rid of "code" values such as 999
III. Imputation
IV. Aggregation
- Sum across different columns that have certain value
Part 2: Implementation I. Machine learning algorithms II. mlflow tracking III. Econometric/Statistical models IV. Hyperparameter tuning loop V. Metrics