week2 pickle model for Lasso
lucapug opened this issue · 0 comments
lucapug commented
with reference to the week 2 modified version of the duration-prediction.ipynb (link)
I think that the pickle model that is log_artifact() in the last line of the following block of code is the wrong one, because the lin_reg.bin model is the one saved outside the mlflow run (the one obtained by fitting the linear regression model without regularization) and this model is different from the one fitted inside the experiment run (that is a Lasso model)
lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_val)
mean_squared_error(y_val, y_pred, squared=False)
7.758715210382775
with open('models/lin_reg.bin', 'wb') as f_out:
pickle.dump((dv, lr), f_out)
with mlflow.start_run():
mlflow.set_tag("developer", "cristian")
mlflow.log_param("train-data-path", "./data/green_tripdata_2021-01.csv")
mlflow.log_param("valid-data-path", "./data/green_tripdata_2021-02.csv")
alpha = 0.1
mlflow.log_param("alpha", alpha)
lr = Lasso(alpha)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_val)
rmse = mean_squared_error(y_val, y_pred, squared=False)
mlflow.log_metric("rmse", rmse)
mlflow.log_artifact(local_path="models/lin_reg.bin", artifact_path="models_pickle")