uxlfoundation/scikit-learn-intelex

Variability in results using sklearnex with ExtraTrees and RandomForest classifiers

YoochanMyung opened this issue · 2 comments

Describe the bug
Getting different results by turning on/off sklearnex with ExtraTrees and RandomForest algorithms.
This issue occurs starting with version 2024.1. I found it with my own dataset, and it's also reproducible with the breast_cancerdataset, but not with the iris dataset.

To Reproduce

  1. Setup 'scikit-learn==1.5.1' (any version from 1.2.1)
  2. Setup 'scikit-learn-intelex==2024.1' (any version from 2024.1)
  3. Run the following test code:
import pandas as pd

from sklearnex import patch_sklearn
patch_sklearn()

from xgboost import XGBClassifier
from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier
from sklearn.metrics import multilabel_confusion_matrix, confusion_matrix

from sklearn.model_selection import  cross_val_predict, train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, label_binarize
from sklearn.metrics import matthews_corrcoef, confusion_matrix
N_CORES = 16

# Toy Data

from sklearn.datasets import load_iris,load_breast_cancer
data = load_breast_cancer()
X = data['data']
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=1)

# ExtraTrees
classifier_cv = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)
classifier_test = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)

cv_results = cross_val_predict(classifier_cv, X_train, y_train, cv=10)
classifier_test.fit(X_train, y_train)

test_results = classifier_test.predict(X_test)
print("###CV###")
print(matthews_corrcoef(y_train, cv_results))
print(confusion_matrix(y_train,cv_results).ravel())

print("###TEST###")
print(matthews_corrcoef(y_test, test_results))
print(confusion_matrix(y_test,test_results).ravel())

Expected behavior
Same results between using sklearnex and original sklearn.

Output/Screenshots

Before patching sklearnex with ExtraTrees

###CV###
0.935861738490973
[144   5   7 242]
###TEST###
0.9247930594534806
[ 58   5   1 107]

After patching sklearnex with ExtraTrees

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
###CV###
0.9409328452526324
[143   6   5 244]
###TEST###
0.8992907835033845
[ 57   6   2 106]

Environment:

  • OS: Ubuntu 22.04.04 LTS
  • Scikit-learn==1.5.1 but I tested on 1.2.1, 1.3.x, 1.4.x.. etc.

Not sure whether it's related but if I use Intelex, I got a warning UserWarning: X does not have valid feature names, but ExtraTreesClassifier was fitted with feature names. Maybe there is a glitch in terms of handling the feature names or their orders by Intelex?