data-describe/awesome-data-science-models

chicago-taxi

Opened this issue · 3 comments

Rework the current chicago-taxi solution. There are dependency conflicts with the current implementation on AI Platform. May need to upgrade the runtime, keras version, and deployment strategy.

@truongc2 and @bobbyjacob anyone looking at this?

I pushed the chicago-taxi demo that I got working for the GCP MLOPs assessment to a new branch in this repo: gcp_ml_assessment_chicago_taxi.

That demo works, but only in the jupyter notebook on vertex-ai workbench that I got working for the assessment: python-20220624-164215-bjacob.

If I try to clone the repo to another workbench instance, the demo breaks due to python dependency issues. The new notebook has a different set of default installed packages. I tried creating conda and venv versions of the working notebook environment but can't get it working on another instance.

I pushed a new branch gcp_ml_assessment_chicago_taxi_aaa. In a new workbench environment, I ran these commands:

pip install tensorflow==2.8.4
pip install tensorflow_data_validation==1.8.0
pip install tensorflow_transform==1.8.0
pip cloudml-hypertune

This notebook works completes without error: 01-dataset-management.ipynb.
This one fails with

CustomJob projects/354621994428/locations/us-central1/customJobs/2464840774366265344 current state:
JobState.JOB_STATE_RUNNING
CustomJob projects/354621994428/locations/us-central1/customJobs/2464840774366265344 current state:
JobState.JOB_STATE_RUNNING
CustomJob projects/354621994428/locations/us-central1/customJobs/2464840774366265344 current state:
JobState.JOB_STATE_FAILED

"The replica workerpool0-0 exited with a non-zero status of 1. 
Traceback (most recent call last):
  File \"/opt/conda/lib/python3.7/runpy.py\", line 193, in _run_module_as_main
    \"__main__\", mod_spec)
  File \"/opt/conda/lib/python3.7/runpy.py\", line 85, in _run_code
    exec(code, run_globals)
  File \"/root/.local/lib/python3.7/site-packages/src/model_training/task.py\", line 27, in <module>
    from src.model_training import defaults, trainer, exporter
  File \"/root/.local/lib/python3.7/site-packages/src/model_training/trainer.py\", line 18, in <module>
    import tensorflow_transform as tft
  File \"/opt/conda/lib/python3.7/site-packages/tensorflow_transform/__init__.py\", line 19, in <module>
    from tensorflow_transform.analyzers import *
  File \"/opt/conda/lib/python3.7/site-packages/tensorflow_transform/analyzers.py\", line 39, in <module>
    from tensorflow_transform import analyzer_nodes
  File \"/opt/conda/lib/python3.7/site-packages/tensorflow_transform/analyzer_nodes.py\", line 36, in <module>
    from tensorflow_transform import nodes
  File \"/opt/conda/lib/python3.7/site-packages/tensorflow_transform/nodes.py\", line 33, in <module>
    from future.utils import with_metaclass
    ModuleNotFoundError: No module named \'future\'

To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=354621994428&resource=ml_job%2Fjob_id%2F2464840774366265344&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%222464840774366265344%22"