Create different machine learning models to study Machine Learning Supervised Regression.
- Explore the dataset to know which columns are in, which types of data, nulls, duplicates, unique values for categorical and correlated.
- Cut, color and clarity are object series.
- There is no null
- There is no duplicated
- Create a correlation matrix and plot it to see which series are correlated
-
Before dropping any column, I select RandomForest Regression to train the model with all series. Finally, I drop series id, x, y and z to train again Random Forest Regression.
-
Training the model with different estimators, max and min depth. Minimum RMSE was with 86 estimators, 21 max depth and 3 min depth.
-
Including Normalized and StandardScaler to train again RandomForest. Something was wrong. Score was up to 5455 :(. I need to inverse the DataFrame but I couldn't.
-
Test different models and generate different csv's.
-
Create new Jupyter Notebook Machine Learning - Training Models where I created a dictionary with all best models.
-
Create a function on a .py file to train each model on the dictionary and visualize it.
-
Improve cleaning process, giving a weight to different values on the cut Series.