DataTalksClub/mlops-zoomcamp

week2 pickle model for Lasso

lucapug opened this issue · 0 comments

with reference to the week 2 modified version of the duration-prediction.ipynb (link)

I think that the pickle model that is log_artifact() in the last line of the following block of code is the wrong one, because the lin_reg.bin model is the one saved outside the mlflow run (the one obtained by fitting the linear regression model without regularization) and this model is different from the one fitted inside the experiment run (that is a Lasso model)

lr = LinearRegression()
lr.fit(X_train, y_train)

y_pred = lr.predict(X_val)

mean_squared_error(y_val, y_pred, squared=False)
7.758715210382775

with open('models/lin_reg.bin', 'wb') as f_out:
    pickle.dump((dv, lr), f_out)

with mlflow.start_run():

    mlflow.set_tag("developer", "cristian")

    mlflow.log_param("train-data-path", "./data/green_tripdata_2021-01.csv")
    mlflow.log_param("valid-data-path", "./data/green_tripdata_2021-02.csv")

    alpha = 0.1
    mlflow.log_param("alpha", alpha)
    lr = Lasso(alpha)
    lr.fit(X_train, y_train)

    y_pred = lr.predict(X_val)
    rmse = mean_squared_error(y_val, y_pred, squared=False)
    mlflow.log_metric("rmse", rmse)

    mlflow.log_artifact(local_path="models/lin_reg.bin", artifact_path="models_pickle")