User warning about sequence matcher
mfitz opened this issue · 3 comments
When I grabbed the latest Elara Docker image and ran the CLI from inside the container I saw the following warning:
$ docker run -it --entrypoint /bin/bash 758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara
root@92d61ece6463:/# elara --help
/usr/local/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
Usage: elara [OPTIONS] COMMAND [ARGS]...
Command line tool for processing a MATSim scenario events output.
Options:
--help Show this message and exit.
Commands:
run Run Elara using a config.
event-handlers Access event handler output group.
plan-handlers Access plan handler output group.
post-processors Access post processing output group.
Some detail on a fix can be found here. However, pip install python-Levenshtein
fails inside the container, apparently due to the lack of a gcc
installation:
root@92d61ece6463:/# pip install python-Levenshtein
Collecting python-Levenshtein
Using cached python-Levenshtein-0.12.2.tar.gz (50 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/site-packages (from python-Levenshtein) (57.5.0)
Building wheels for collected packages: python-Levenshtein
Building wheel for python-Levenshtein (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [32 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/Levenshtein
copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.8/Levenshtein
copying Levenshtein/__init__.py -> build/lib.linux-x86_64-3.8/Levenshtein
running egg_info
writing python_Levenshtein.egg-info/PKG-INFO
writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
writing entry points to python_Levenshtein.egg-info/entry_points.txt
writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
writing requirements to python_Levenshtein.egg-info/requires.txt
writing top-level names to python_Levenshtein.egg-info/top_level.txt
reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*pyc' found anywhere in distribution
warning: no previously-included files matching '*so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distribution
adding license file 'COPYING'
writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.8/Levenshtein
copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.8/Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/Levenshtein
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.8 -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.8/Levenshtein/_levenshtein.o
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for python-Levenshtein
Running setup.py clean for python-Levenshtein
Failed to build python-Levenshtein
Installing collected packages: python-Levenshtein
Running setup.py install for python-Levenshtein ... error
error: subprocess-exited-with-error
× Running setup.py install for python-Levenshtein did not run successfully.
│ exit code: 1
╰─> [32 lines of output]
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/Levenshtein
copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.8/Levenshtein
copying Levenshtein/__init__.py -> build/lib.linux-x86_64-3.8/Levenshtein
running egg_info
writing python_Levenshtein.egg-info/PKG-INFO
writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
writing entry points to python_Levenshtein.egg-info/entry_points.txt
writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
writing requirements to python_Levenshtein.egg-info/requires.txt
writing top-level names to python_Levenshtein.egg-info/top_level.txt
reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*pyc' found anywhere in distribution
warning: no previously-included files matching '*so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distribution
adding license file 'COPYING'
writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.8/Levenshtein
copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.8/Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/Levenshtein
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.8 -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.8/Levenshtein/_levenshtein.o
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> python-Levenshtein
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
The solution is probably to install gcc and pip install python-Levenshtein
in the Docker image . This may also improve the performance of any fuzzy matching operations in Elara by having them use faster distancing algorithm implementations.
how i can download the latest image to work on the problem?
i've tried the cmd above but it returned authentication error
docker run -it --entrypoint /bin/bash 758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara
Unable to find image '758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara:latest' locally
docker: Error response from daemon: Head "https://758645626094.dkr.ecr.eu-west-1.amazonaws.com/v2/elara/manifests/latest": no basic auth credentials.
See 'docker run --help'.
how i can download the latest image to work on the problem?
i've tried the cmd above but it returned authentication error
docker run -it --entrypoint /bin/bash 758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara Unable to find image '758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara:latest' locally docker: Error response from daemon: Head "https://758645626094.dkr.ecr.eu-west-1.amazonaws.com/v2/elara/manifests/latest": no basic auth credentials. See 'docker run --help'.
Sorry for the late reply. You won't be able to pull that image because it's currently private inside our ECR repo, but you can build the image locally directly from the Dockerfile
(docker build -t elara-local .
should do it) once you've cloned the GitHub repo.
The warning has been fixed by this commit