Variability in results using sklearnex with ExtraTrees and RandomForest classifiers
YoochanMyung opened this issue · 2 comments
Describe the bug
Getting different results by turning on/off sklearnex with ExtraTrees and RandomForest algorithms.
This issue occurs starting with version 2024.1. I found it with my own dataset, and it's also reproducible with the breast_cancer
dataset, but not with the iris
dataset.
To Reproduce
- Setup 'scikit-learn==1.5.1' (any version from 1.2.1)
- Setup 'scikit-learn-intelex==2024.1' (any version from 2024.1)
- Run the following test code:
import pandas as pd
from sklearnex import patch_sklearn
patch_sklearn()
from xgboost import XGBClassifier
from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier
from sklearn.metrics import multilabel_confusion_matrix, confusion_matrix
from sklearn.model_selection import cross_val_predict, train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, label_binarize
from sklearn.metrics import matthews_corrcoef, confusion_matrix
N_CORES = 16
# Toy Data
from sklearn.datasets import load_iris,load_breast_cancer
data = load_breast_cancer()
X = data['data']
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=1)
# ExtraTrees
classifier_cv = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)
classifier_test = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)
cv_results = cross_val_predict(classifier_cv, X_train, y_train, cv=10)
classifier_test.fit(X_train, y_train)
test_results = classifier_test.predict(X_test)
print("###CV###")
print(matthews_corrcoef(y_train, cv_results))
print(confusion_matrix(y_train,cv_results).ravel())
print("###TEST###")
print(matthews_corrcoef(y_test, test_results))
print(confusion_matrix(y_test,test_results).ravel())
Expected behavior
Same results between using sklearnex and original sklearn.
Output/Screenshots
Before patching sklearnex with ExtraTrees
###CV###
0.935861738490973
[144 5 7 242]
###TEST###
0.9247930594534806
[ 58 5 1 107]
After patching sklearnex with ExtraTrees
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
###CV###
0.9409328452526324
[143 6 5 244]
###TEST###
0.8992907835033845
[ 57 6 2 106]
Environment:
- OS: Ubuntu 22.04.04 LTS
- Scikit-learn==1.5.1 but I tested on 1.2.1, 1.3.x, 1.4.x.. etc.
Not sure whether it's related but if I use Intelex, I got a warning UserWarning: X does not have valid feature names, but ExtraTreesClassifier was fitted with feature names
. Maybe there is a glitch in terms of handling the feature names or their orders by Intelex?