
Numba typing errors

Opened this issue · 0 comments


I'm trying to reproduce the experiment in your readme, but I keep getting numba errors that are not very descriptive.

My code:

from gefs import RandomForest
from experiments.prep import get_data, train_test_split

data, ncat = get_data('wine')
X_train, X_test, y_train, y_test, data_train, data_test = train_test_split(data, ncat)
rf = RandomForest(n_estimators=30, ncat=ncat), y_train)
gef = rf.topc()


Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/", line 87, in _run_code
    exec(code, run_globals)
  File "***/", line 7, in <module>, y_train)
  File "***/gefs/", line 533, in fit
    self.estimators = build_forest(X, y, self.n_estimators, self.bootstrap,
  File "/opt/conda/lib/python3.9/site-packages/numba/core/", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/opt/conda/lib/python3.9/site-packages/numba/core/", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in method choice of numpy.random.mtrand.RandomState object at 0x7f2c36da0940>) found for signature:

 >>> choice(array(int64, 1d, C), OptionalType(int64), replace=Literal[bool](False))

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'choice': File: numba/cpython/ Line 1360.
    With argument(s): '(array(int64, 1d, C), OptionalType(int64), replace=bool)':
   Rejected as the implementation raised a specific error:
     TypingError: Failed in nopython mode pipeline (step: nopython frontend)
   No implementation of function Function(<built-in function empty>) found for signature:

    >>> empty(OptionalType(int64), class(int64))

   There are 2 candidate implementations:
         - Of which 2 did not match due to:
         Overload in function 'ol_np_empty': File: numba/np/ Line 4086.
           With argument(s): '(OptionalType(int64), class(int64))':
          Rejected as the implementation raised a specific error:
            TypingError: Cannot parse input types to function np.empty(OptionalType(int64), class(int64))
     raised from /opt/conda/lib/python3.9/site-packages/numba/np/
   During: resolving callee type: Function(<built-in function empty>)
   During: typing of call at /opt/conda/lib/python3.9/site-packages/numba/cpython/ (1417)
   File "../../../../../../opt/conda/lib/python3.9/site-packages/numba/cpython/", line 1417:
           def choice_impl(a, size=None, replace=True):
               <source elided>
               if replace:
                   out = np.empty(size, dtype)

  raised from /opt/conda/lib/python3.9/site-packages/numba/core/

During: resolving callee type: Function(<built-in method choice of numpy.random.mtrand.RandomState object at 0x7f2c36da0940>)
During: typing of call at ***/gefs/ (145)

File "gefs/", line 145:
def find_best_split(node, tree, random_state):
    <source elided>
    vars = np.random.choice(np.arange(tree.X.shape[1]), tree.max_features, replace=False)

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/ (132)

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/ (132)

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/ (132)

File "gefs/", line 132:
def build_tree(tree, parent, counts, ordered_ids):
    <source elided>
        node = queue.pop(0)
        split = find_best_split(node, tree, np.random.randint(1e6))

During: resolving callee type: type(CPUDispatcher(<function build_tree at 0x7f2b8ca9f700>))
During: typing of call at ***/gefs/ (465)

During: resolving callee type: type(CPUDispatcher(<function build_tree at 0x7f2b8ca9f700>))
During: typing of call at ***/gefs/ (465)

File "gefs/", line 465:
    def fit(self, X, y):
        <source elided>
        ordered_ids = np.arange(X.shape[0], dtype=np.int64)
        self.root, self.n_nodes = build_tree(self, None, counts, ordered_ids)

- Resolution failure for non-literal arguments:

During: resolving callee type: BoundFunction((<class 'numba.core.types.misc.ClassInstanceType'>, 'fit') for instance.jitclass.Tree#7f2b8caad490<X:OptionalType(array(float64, 2d, A)),y:OptionalType(array(int64, 1d, A)),ncat:OptionalType(array(int64, 1d, A)),scope:OptionalType(array(int64, 1d, A)),imp_measure:unicode_type,min_samples_leaf:int64,min_samples_split:int64,n_classes:int64,max_features:OptionalType(int64),n_nodes:int64,root:instance.jitclass.TreeNode#7f2b8caa6b80<id:int64,counts:array(int64, 1d, A),idx:array(int64, 1d, A),split:OptionalType(instance.jitclass.Split#7f2b8ca89bb0<score:float64,var:int64,threshold:array(float64, 1d, A),surr_var:array(int64, 1d, A),surr_thr:array(float64, 1d, A),surr_go_left:array(bool, 1d, A),surr_blind:bool,left_ids:array(int64, 1d, A),right_ids:array(int64, 1d, A),left_counts:array(int64, 1d, A),right_counts:array(int64, 1d, A),type:unicode_type>),parent:OptionalType(DeferredType#139825020508336),left_child:OptionalType(DeferredType#139825020508336),right_child:OptionalType(DeferredType#139825020508336),isleaf:OptionalType(bool),depth:int16>,depth:int16,max_depth:int64,surrogate:bool,random_state:int64>)
During: typing of call at ***/gefs/ (179)

File "gefs/", line 179:
def build_forest(X, y, n_estimators, bootstrap, ncat, imp_measure,
    <source elided>
            estimators[i].fit(Xtree_, ytree_)

My guess is that it might happen because some dependencies got updated. I'm running the code in a conda environment with the following versions installed:

numba                     0.56.3
numpy                     1.22.3 
pandas                    1.4.2
scipy                     1.9.0
sklearn                   1.1.2
tqdm                      4.64.0 

Could you possibly upload a solved environment or a freeze with specific package versions that allow to execute the code properly?
