koaning/whatlies

Error message when trying to use PCA

talamoig opened this issue · 3 comments

Hi,
I've just discovered Whatlies library and I was able to produce some interesting visualization. Though as soon as I try to import the PCA transformers (from whatlies.transformers import Pca), I get an error.
It also happens if it's the first line in the code, so I guess it must be somehow related to my specific modules/python combination.
I am running python 3.9.1 and here is the list of pip packages (I'm showing them all since I'm not sure about dependencies). Error message is afterwards.

Thanks,
Ivano

Package               Version
--------------------- -----------
altair                4.1.0
apparmor              3.0.1
appdirs               1.4.4
argon2-cffi           20.1.0
asn1crypto            1.4.0
async-generator       1.10
attrs                 20.3.0
backcall              0.2.0
bcc                   0.18.0
binaryornot           0.4.4
biplist               1.0.3
bleach                3.2.1
blis                  0.7.4
bob                   0.1
bpemb                 0.3.2
Brlapi                0.8.0
btrfsutil             5.9
CacheControl          0.12.6
catalogue             1.0.0
cffi                  1.14.4
chardet               3.0.4
click                 7.1.2
colorama              0.4.4
contextlib2           0.6.0.post1
cryptography          3.3.1
cycler                0.10.0
cymem                 2.0.5
decorator             4.4.2
defusedxml            0.6.0
distlib               0.3.1
distro                1.5.0
docutils              0.16
en-core-web-sm        2.3.1
entrypoints           0.3
fasttext              0.9.2
flashfocus            2.2.2
gensim                3.8.3
html5lib              1.1
i3ipc                 2.2.1
idna                  2.10
ipykernel             5.3.4
ipython               7.19.0
ipython-genutils      0.2.0
ipywidgets            7.5.1
it-core-news-lg       2.3.0
it-core-news-md       2.3.0
it-core-news-sm       2.3.0
jedi                  0.17.2
jeepney               0.6.0
Jinja2                2.11.2
joblib                0.17.0
jsonschema            3.2.0
jupyter               1.0.0
jupyter-client        6.1.7
jupyter-console       6.2.0
jupyter-core          4.7.0
jupyterlab-pygments   0.1.2
keyring               21.5.0
kiwisolver            1.3.1
LibAppArmor           3.0.1
libfdt                1.6.0
lit                   0.10.1.dev0
llvmlite              0.34.0
louis                 3.16.0
MarkupSafe            1.1.1
marshmallow           3.10.0
matplot               0.1.9
matplotlib            3.3.3
meson                 0.56.1
mistune               0.8.4
msgpack               1.0.2
murmurhash            1.0.5
nbclient              0.5.1
nbconvert             6.0.7
nbformat              5.0.8
nest-asyncio          1.4.3
networkx              2.5
nltk                  3.5
notebook              6.1.5
numba                 0.51.2
numpy                 1.19.4
ordered-set           4.0.2
packaging             20.8
pandas                1.1.5
pandocfilters         1.4.3
parso                 0.7.1
pep517                0.9.1
pexpect               4.8.0
pickleshare           0.7.5
Pillow                8.0.1
pip                   20.2.4
pkginfo               1.6.1
plac                  1.1.3
ply                   3.11
powerline-status      2.8.1
preshed               3.0.5
progress              1.5
prometheus-client     0.9.0
prompt-toolkit        3.0.8
psutil                5.8.0
ptyprocess            0.6.0
pybind11              2.6.1
pycparser             2.20
pyexifinfo            0.4.0
Pygments              2.7.3
PyGObject             3.38.0
pykerberos            1.2.1
pyloco                0.0.139
pynndescent           0.5.1
pyOpenSSL             20.0.1
pypandoc              1.5
pyparsing             2.4.7
pyrsistent            0.17.3
python-dateutil       2.8.1
python-gitlab         2.5.0
python-xlib           0.29
pytz                  2020.4
PyYAML                5.3.1
pyzmq                 20.0.0
qtconsole             5.0.1
QtPy                  1.9.0
readme-renderer       28.0
regex                 2020.11.13
requests              2.25.1
requests-kerberos     0.12.0
requests-toolbelt     0.9.1
resolvelib            0.5.4
retrying              1.3.3
rfc3986               1.4.0
scikit-learn          0.23.2
scipy                 1.5.4
seaborn               0.11.1
SecretStorage         3.3.0
Send2Trash            1.5.0
sentencepiece         0.1.95
setuptools            51.1.1
SimpleWebSocketServer 0.1.1
six                   1.15.0
sklearn               0.0
smart-open            4.0.1
spacy                 2.3.4
spacy-lookups-data    0.3.2
srsly                 1.0.5
sshuttle              1.0.5
team                  1.0
terminado             0.9.1
testpath              0.4.4
thinc                 7.4.3
threadpoolctl         2.1.0
toml                  0.10.2
toolz                 0.11.1
tornado               6.1
tqdm                  4.54.1
traitlets             5.0.5
twine                 3.2.0
typing                3.7.4.3
umap-learn            0.5.0
urllib3               1.26.1
ushlex                0.99.1
wasabi                0.8.0
wcwidth               0.2.5
webencodings          0.5.1
websocket-client      0.57.0
whatlies              0.5.10
widgetsnbextension    3.5.1
xcffib                0.11.1
xlrd                  1.2.0
xpybutil              0.0.6

And here is the error message:


---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-13-b73e1d05c847> in <module>
      6 emb = EmbeddingSet(*[lang[w] for w in words])
      7 
----> 8 from whatlies.transformers import Pca
      9 pca_plot = emb.transform(Pca(2)).plot_interactive(x_label='pca_0', y_label='pca_1')

~/.local/lib/python3.9/site-packages/whatlies/transformers/__init__.py in <module>
      1 from whatlies.transformers._pca import Pca
----> 2 from whatlies.transformers._umap import Umap
      3 from whatlies.transformers._noise import Noise
      4 from whatlies.transformers._addrandom import AddRandom
      5 from whatlies.transformers._tsne import Tsne

~/.local/lib/python3.9/site-packages/whatlies/transformers/_umap.py in <module>
----> 1 from umap import UMAP
      2 
      3 from ._transformer import SklearnTransformer
      4 
      5 

~/.local/lib/python3.9/site-packages/umap/__init__.py in <module>
      1 from warnings import warn, catch_warnings, simplefilter
----> 2 from .umap_ import UMAP
      3 
      4 try:
      5     with catch_warnings():

~/.local/lib/python3.9/site-packages/umap/umap_.py in <module>
     45 )
     46 
---> 47 from pynndescent import NNDescent
     48 from pynndescent.distances import named_distances as pynn_named_distances
     49 from pynndescent.sparse import sparse_named_distances as pynn_sparse_named_distances

~/.local/lib/python3.9/site-packages/pynndescent/__init__.py in <module>
      1 import pkg_resources
      2 import numba
----> 3 from .pynndescent_ import NNDescent, PyNNDescentTransformer
      4 
      5 # Workaround: https://github.com/numba/numba/issues/3341

~/.local/lib/python3.9/site-packages/pynndescent/pynndescent_.py in <module>
     19 import heapq
     20 
---> 21 import pynndescent.sparse as sparse
     22 import pynndescent.sparse_nndescent as sparse_nnd
     23 import pynndescent.distances as pynnd_dist

~/.local/lib/python3.9/site-packages/pynndescent/sparse.py in <module>
    328     },
    329 )
--> 330 def sparse_alternative_jaccard(ind1, data1, ind2, data2):
    331     num_non_zero = arr_union(ind1, ind2).shape[0]
    332     num_equal = arr_intersect(ind1, ind2).shape[0]

~/.local/lib/python3.9/site-packages/numba/core/decorators.py in wrapper(func)
    216             with typeinfer.register_dispatcher(disp):
    217                 for sig in sigs:
--> 218                     disp.compile(sig)
    219                 disp.disable_compile()
    220         return disp

~/.local/lib/python3.9/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

~/.local/lib/python3.9/site-packages/numba/core/dispatcher.py in compile(self, sig)
    817             self._cache_misses[sig] += 1
    818             try:
--> 819                 cres = self._compiler.compile(args, return_type)
    820             except errors.ForceLiteralArg as e:
    821                 def folded(args, kws):

~/.local/lib/python3.9/site-packages/numba/core/dispatcher.py in compile(self, args, return_type)
     80             return retval
     81         else:
---> 82             raise retval
     83 
     84     def _compile_cached(self, args, return_type):

~/.local/lib/python3.9/site-packages/numba/core/dispatcher.py in _compile_cached(self, args, return_type)
     90 
     91         try:
---> 92             retval = self._compile_core(args, return_type)
     93         except errors.TypingError as e:
     94             self._failed_cache[key] = e

~/.local/lib/python3.9/site-packages/numba/core/dispatcher.py in _compile_core(self, args, return_type)
    103 
    104         impl = self._get_implementation(args, {})
--> 105         cres = compiler.compile_extra(self.targetdescr.typing_context,
    106                                       self.targetdescr.target_context,
    107                                       impl,

~/.local/lib/python3.9/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    625     pipeline = pipeline_class(typingctx, targetctx, library,
    626                               args, return_type, flags, locals)
--> 627     return pipeline.compile_extra(func)
    628 
    629 

~/.local/lib/python3.9/site-packages/numba/core/compiler.py in compile_extra(self, func)
    361         self.state.lifted = ()
    362         self.state.lifted_from = None
--> 363         return self._compile_bytecode()
    364 
    365     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

~/.local/lib/python3.9/site-packages/numba/core/compiler.py in _compile_bytecode(self)
    423         """
    424         assert self.state.func_ir is None
--> 425         return self._compile_core()
    426 
    427     def _compile_ir(self):

~/.local/lib/python3.9/site-packages/numba/core/compiler.py in _compile_core(self)
    403                 self.state.status.fail_reason = e
    404                 if is_final_pipeline:
--> 405                     raise e
    406         else:
    407             raise CompilerError("All available pipelines exhausted")

~/.local/lib/python3.9/site-packages/numba/core/compiler.py in _compile_core(self)
    394             res = None
    395             try:
--> 396                 pm.run(self.state)
    397                 if self.state.cr is not None:
    398                     break

~/.local/lib/python3.9/site-packages/numba/core/compiler_machinery.py in run(self, state)
    339                     (self.pipeline_name, pass_desc)
    340                 patched_exception = self._patch_error(msg, e)
--> 341                 raise patched_exception
    342 
    343     def dependency_analysis(self):

~/.local/lib/python3.9/site-packages/numba/core/compiler_machinery.py in run(self, state)
    330                 pass_inst = _pass_registry.get(pss).pass_inst
    331                 if isinstance(pass_inst, CompilerPass):
--> 332                     self._runPass(idx, pass_inst, state)
    333                 else:
    334                     raise BaseException("Legacy pass in use")

~/.local/lib/python3.9/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

~/.local/lib/python3.9/site-packages/numba/core/compiler_machinery.py in _runPass(self, index, pss, internal_state)
    289             mutated |= check(pss.run_initialization, internal_state)
    290         with SimpleTimer() as pass_time:
--> 291             mutated |= check(pss.run_pass, internal_state)
    292         with SimpleTimer() as finalize_time:
    293             mutated |= check(pss.run_finalizer, internal_state)

~/.local/lib/python3.9/site-packages/numba/core/compiler_machinery.py in check(func, compiler_state)
    262 
    263         def check(func, compiler_state):
--> 264             mangled = func(compiler_state)
    265             if mangled not in (True, False):
    266                 msg = ("CompilerPass implementations should return True/False. "

~/.local/lib/python3.9/site-packages/numba/core/typed_passes.py in run_pass(self, state)
     90                               % (state.func_id.func_name,)):
     91             # Type inference
---> 92             typemap, return_type, calltypes = type_inference_stage(
     93                 state.typingctx,
     94                 state.func_ir,

~/.local/lib/python3.9/site-packages/numba/core/typed_passes.py in type_inference_stage(typingctx, interp, args, return_type, locals, raise_errors)
     68 
     69         infer.build_constraint()
---> 70         infer.propagate(raise_errors=raise_errors)
     71         typemap, restype, calltypes = infer.unify(raise_errors=raise_errors)
     72 

~/.local/lib/python3.9/site-packages/numba/core/typeinfer.py in propagate(self, raise_errors)
   1069                                   if isinstance(e, ForceLiteralArg)]
   1070                 if not force_lit_args:
-> 1071                     raise errors[0]
   1072                 else:
   1073                     raise reduce(operator.or_, force_lit_args)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython mode backend)
Failed in nopython mode pipeline (step: nopython mode backend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function make_quicksort_impl.<locals>.run_quicksort at 0x7f2061592700>) found for signature:
 
 >>> run_quicksort(array(int32, 1d, C))
 
There are 2 candidate implementations:
      - Of which 2 did not match due to:
      Overload in function 'register_jitable.<locals>.wrap.<locals>.ov_wrap': File: numba/core/extending.py: Line 150.
        With argument(s): '(array(int32, 1d, C))':
       Rejected as the implementation raised a specific error:
         UnsupportedError: Failed in nopython mode pipeline (step: analyzing bytecode)
       Use of unsupported opcode (LOAD_ASSERTION_ERROR) found
       
       File "../.local/lib/python3.9/site-packages/numba/misc/quicksort.py", line 180:
           def run_quicksort(A):
               <source elided>
                   while high - low >= SMALL_QUICKSORT:
                       assert n < MAX_STACK
                       ^
       
  raised from /home/talamo_i/.local/lib/python3.9/site-packages/numba/core/byteflow.py:269

During: resolving callee type: Function(<function make_quicksort_impl.<locals>.run_quicksort at 0x7f2061592700>)
During: typing of call at /home/talamo_i/.local/lib/python3.9/site-packages/numba/np/arrayobj.py (5007)


File "../.local/lib/python3.9/site-packages/numba/np/arrayobj.py", line 5007:
    def array_sort_impl(arr):
        <source elided>
        # Note we clobber the return value
        sort_func(arr)

Mhm. It seems to be related to numba inside of Umap. Since it's a bug in a dependency I'm not 100% what I might be able to do to fix it. I also only test on 3.6/3.7.

Could you confirm if the problem goes away after switching python versions?

I think I've found the culprit. It's got to do with importing. I'll make a separate issue for this. I think it's time for this library to drop support for internal transformers and just switch to using the ones from the scikit-learn API. That scales and removes the need to download all manners of dependencies.

Closing this as it should now be fixed.