AlCorreia/GeFs

Numba typing errors

Opened this issue · 0 comments

Hi!

I'm trying to reproduce the experiment in your readme, but I keep getting numba errors that are not very descriptive.

My code:

from gefs import RandomForest
from experiments.prep import get_data, train_test_split

data, ncat = get_data('wine')
X_train, X_test, y_train, y_test, data_train, data_test = train_test_split(data, ncat)
rf = RandomForest(n_estimators=30, ncat=ncat)
rf.fit(X_train, y_train)
gef = rf.topc()

Traceback:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "***/test_gefs.py", line 7, in <module>
    rf.fit(X_train, y_train)
  File "***/gefs/trees.py", line 533, in fit
    self.estimators = build_forest(X, y, self.n_estimators, self.bootstrap,
  File "/opt/conda/lib/python3.9/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/opt/conda/lib/python3.9/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in method choice of numpy.random.mtrand.RandomState object at 0x7f2c36da0940>) found for signature:

 >>> choice(array(int64, 1d, C), OptionalType(int64), replace=Literal[bool](False))

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'choice': File: numba/cpython/randomimpl.py: Line 1360.
    With argument(s): '(array(int64, 1d, C), OptionalType(int64), replace=bool)':
   Rejected as the implementation raised a specific error:
     TypingError: Failed in nopython mode pipeline (step: nopython frontend)
   No implementation of function Function(<built-in function empty>) found for signature:

    >>> empty(OptionalType(int64), class(int64))

   There are 2 candidate implementations:
         - Of which 2 did not match due to:
         Overload in function 'ol_np_empty': File: numba/np/arrayobj.py: Line 4086.
           With argument(s): '(OptionalType(int64), class(int64))':
          Rejected as the implementation raised a specific error:
            TypingError: Cannot parse input types to function np.empty(OptionalType(int64), class(int64))
     raised from /opt/conda/lib/python3.9/site-packages/numba/np/arrayobj.py:4105
   
   During: resolving callee type: Function(<built-in function empty>)
   During: typing of call at /opt/conda/lib/python3.9/site-packages/numba/cpython/randomimpl.py (1417)
   
   
   File "../../../../../../opt/conda/lib/python3.9/site-packages/numba/cpython/randomimpl.py", line 1417:
           def choice_impl(a, size=None, replace=True):
               <source elided>
               if replace:
                   out = np.empty(size, dtype)
                   ^

  raised from /opt/conda/lib/python3.9/site-packages/numba/core/typeinfer.py:1086

During: resolving callee type: Function(<built-in method choice of numpy.random.mtrand.RandomState object at 0x7f2c36da0940>)
During: typing of call at ***/gefs/split.py (145)


File "gefs/split.py", line 145:
def find_best_split(node, tree, random_state):
    <source elided>
    np.random.seed(random_state)
    vars = np.random.choice(np.arange(tree.X.shape[1]), tree.max_features, replace=False)
    ^

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/trees.py (132)

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/trees.py (132)

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/trees.py (132)


File "gefs/trees.py", line 132:
def build_tree(tree, parent, counts, ordered_ids):
    <source elided>
        node = queue.pop(0)
        split = find_best_split(node, tree, np.random.randint(1e6))
        ^

During: resolving callee type: type(CPUDispatcher(<function build_tree at 0x7f2b8ca9f700>))
During: typing of call at ***/gefs/trees.py (465)

During: resolving callee type: type(CPUDispatcher(<function build_tree at 0x7f2b8ca9f700>))
During: typing of call at ***/gefs/trees.py (465)


File "gefs/trees.py", line 465:
    def fit(self, X, y):
        <source elided>
        ordered_ids = np.arange(X.shape[0], dtype=np.int64)
        self.root, self.n_nodes = build_tree(self, None, counts, ordered_ids)
        ^

- Resolution failure for non-literal arguments:
None

During: resolving callee type: BoundFunction((<class 'numba.core.types.misc.ClassInstanceType'>, 'fit') for instance.jitclass.Tree#7f2b8caad490<X:OptionalType(array(float64, 2d, A)),y:OptionalType(array(int64, 1d, A)),ncat:OptionalType(array(int64, 1d, A)),scope:OptionalType(array(int64, 1d, A)),imp_measure:unicode_type,min_samples_leaf:int64,min_samples_split:int64,n_classes:int64,max_features:OptionalType(int64),n_nodes:int64,root:instance.jitclass.TreeNode#7f2b8caa6b80<id:int64,counts:array(int64, 1d, A),idx:array(int64, 1d, A),split:OptionalType(instance.jitclass.Split#7f2b8ca89bb0<score:float64,var:int64,threshold:array(float64, 1d, A),surr_var:array(int64, 1d, A),surr_thr:array(float64, 1d, A),surr_go_left:array(bool, 1d, A),surr_blind:bool,left_ids:array(int64, 1d, A),right_ids:array(int64, 1d, A),left_counts:array(int64, 1d, A),right_counts:array(int64, 1d, A),type:unicode_type>),parent:OptionalType(DeferredType#139825020508336),left_child:OptionalType(DeferredType#139825020508336),right_child:OptionalType(DeferredType#139825020508336),isleaf:OptionalType(bool),depth:int16>,depth:int16,max_depth:int64,surrogate:bool,random_state:int64>)
During: typing of call at ***/gefs/trees.py (179)


File "gefs/trees.py", line 179:
def build_forest(X, y, n_estimators, bootstrap, ncat, imp_measure,
    <source elided>
                                               estimators[i].random_state)
            estimators[i].fit(Xtree_, ytree_)

My guess is that it might happen because some dependencies got updated. I'm running the code in a conda environment with the following versions installed:

numba                     0.56.3
numpy                     1.22.3 
pandas                    1.4.2
scipy                     1.9.0
sklearn                   1.1.2
tqdm                      4.64.0 

Could you possibly upload a solved environment or a freeze with specific package versions that allow to execute the code properly?

BR,
Maurycy