greenplum-db/plcontainer

Python container: ModuleNotFoundError: No module named 'nltk'

yv5125 opened this issue · 3 comments

Hello!

The docker container for Python3 is not working because Ubuntu 18 has python 3.6 and when you do pip3 install it will be installed in python 3.6 environment. But pl/container uses python 3.7.

As result I've got and error:

gpadmin=# select test();
ERROR:  PL/Container client exception occurred: 
 Exception occurred in Python during function execution 
 Traceback (most recent call last):
  File "<string>", line 4, in pylog
ModuleNotFoundError: No module named 'nltk'
CONTEXT:  PLContainer function "test"

And for check:

$ docker run --rm -it python3:devel bash

# python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> 

# python3.7
Python 3.7.5 (default, Nov  7 2019, 10:50:52) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'nltk'
>>> 

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

The fix for python container is:

RUN python3.7 -m pip install nltk
RUN python3.7 -m nltk.downloader all

Here is the PR - #610