algorithmiaio/dev-center

Unable to run scikit-learn housing example with documented dependencies

Opened this issue · 1 comments

Overview

Following along the scikit-learn example at https://algorithmia.com/developers/model-deployment/scikit and using the specified dependencies (numpy and scikit-learn>=0.14,<0.18) does not work with the Python 3.x environment (legacy or IPA) and throws an error when you call the algorithm.

Steps to reproduce

  1. Upload the data files from https://github.com/algorithmiaio/sample-apps/tree/master/algo-dev-demo/scikit-learn-demo/data to a hosted data collection.

  2. Create a Python 3.x (legacy) environment.

  3. Specify the dependencies as noted in the documentation:

numpy
scikit-learn>=0.14,<0.18
  1. Paste the example code from https://algorithmia.com/developers/model-deployment/scikit with the apply() function and edit the data path.

  2. Save and build the algorithm.

  3. Pass the path to the test data as an input in the test console.

  4. The following error occurs:

> "data://koverholt/scikit/boston_test_data.csv"
Error: Algorithm process exited
Traceback (most recent call last):
  File "/opt/algorithm/bin/pipe.py", line 14, in <module>
    algorithm = __import__('src.'+config['algoname'], fromlist=["apply"])
  File "/opt/algorithm/src/scikit.py", line 7, in <module>
    from sklearn.datasets import load_boston
  File "/opt/algorithm/dependencies/sklearn/datasets/__init__.py", line 24, in <module>
    from .twenty_newsgroups import fetch_20newsgroups
  File "/opt/algorithm/dependencies/sklearn/datasets/twenty_newsgroups.py", line 54, in <module>
    from ..feature_extraction.text import CountVectorizer
  File "/opt/algorithm/dependencies/sklearn/feature_extraction/__init__.py", line 10, in <module>
    from . import text
  File "/opt/algorithm/dependencies/sklearn/feature_extraction/text.py", line 29, in <module>
    from ..preprocessing import normalize
  File "/opt/algorithm/dependencies/sklearn/preprocessing/__init__.py", line 31, in <module>
    from .imputation import Imputer
  File "/opt/algorithm/dependencies/sklearn/preprocessing/imputation.py", line 9, in <module>
    from scipy import stats
  File "/opt/anaconda3/lib/python3.5/site-packages/scipy/stats/__init__.py", line 340, in <module>
    from .morestats import *
  File "/opt/anaconda3/lib/python3.5/site-packages/scipy/stats/morestats.py", line 16, in <module>
    from numpy.testing.decorators import setastest
ImportError: No module named 'numpy.testing.decorators'

Suggested fix

  1. Suggest updating the steps to include a specific algorithm environment (Python 3.x CPU) when creating an algorithm.

  2. Update the dependencies in https://github.com/algorithmiaio/sample-apps/blob/master/algo-dev-demo/scikit-learn-demo/demo/requirements.txt and the documentation page at https://algorithmia.com/developers/model-deployment/scikit.

The following pinned dependencies worked for me using the Python 3.x legacy environment:

numpy==1.11.3
scikit-learn==0.17.1
scipy==1.2.1

although there might be other version specs/ranges that work as well.

Note that these dependencies do not work with the Python 3.7 IPA environment as it fails to build, hence the recommendation to specify the Python 3.x legacy environment, but we should look more into that build failure as well.

This code example is fixed in algorithmiaio/sample-apps#57 and algorithmiaio/sample-apps#58 with updated code and dependencies.

So now, the documentation in the dev-center for the scikit-learn example is outdated, and we can update the docs, or just link to the README in https://github.com/algorithmiaio/sample-apps/tree/master/algo-dev-demo/scikit-learn-demo, or something else.