sktime/skbase

[BUG] `all_objects` redirects `sys.stdout` permanently

Closed this issue · 7 comments

Describe the bug
Sktime changes stdout, which breaks a jupyter notebook

To Reproduce

from sktime.registry import all_estimators
import sys
orig_stdout = sys.stdout
all_estimators(estimator_types="clusterer", filter_tags="capability:unequal_length")
new_stdout = sys.stdout
print(f"sktime changes stdout from {orig_stdout} to {new_stdout}")

sktime changes stdout from <ipykernel.iostream.OutStream object at 0x7f1a0ea77f10> to <_io.TextIOWrapper name='' mode='w' encoding='utf-8'>

Expected behavior
No changes

Versions

Additional context

Please also not that not of them work, see sktime/sktime#6276

from pydoc import locate
from sktime.registry import all_estimators
from sktime.datasets import load_acsf1

# create test data set
RANDOM_STATE= 2
no_of_unknown_clusters = 5
X, _ = load_acsf1(return_type='pd-multiindex')
# remove last 10 rows from the last appliance to simulate unequal time series 
X_mod = X.iloc[:-10]


for model in all_estimators(estimator_types="clusterer", filter_tags="capability:unequal_length"):
    sys.stdout = orig_stdout  # to fix stdout bug
    obj = model[1]
    unequal_clst = locate(".".join([obj.__module__, obj.__name__]))
    try:
        unequal_clst().fit(X=X_mod) 
        print(f"{obj.__name__} did not fail")
    except TypeError:
        print(f"{obj.__name__} failed on typeerror")
    except ValueError as e:
         print(f"{obj.__name__} failed on {e}")
ClustererPipeline failed on typeerror
SklearnClustererPipeline failed on typeerror
TimeSeriesDBSCAN failed on typeerror
TimeSeriesKMeans failed on The data has unequal length series, this clusterer cannot handle unequal length series
TimeSeriesKMeansTslearn failed on The data has unequal length series, this clusterer cannot handle unequal length series
TimeSeriesKMedoids failed on The data has unequal length series, this clusterer cannot handle unequal length series
TimeSeriesKShapes failed on The data has unequal length series, this clusterer cannot handle unequal length series
TimeSeriesKernelKMeans failed on The data has unequal length series, this clusterer cannot handle unequal length series
TimeSeriesLloyds failed on typeerror

Thanks for reporting! This might be coming from scikit-base all_objects.

Could you kindly report:

  • your sktime version, use show_versions from sktime
  • if this is still the case if you switch suppress_import_stdout to False?

Regarding the "unequal length does not work":

  • you need to pass a dict as filter_tags, i.e., {"capability:unequal_length": True} at the moment. This may be unintuitive, so perhaps we want to allow strings to mean "True".
  • for some estimators, whether it can handle unequal length daata may depend on the values, e.g., DBSCAN needs to be given a distance that can handle unequal length time series.

I have the same issue: after calling all_estimators, print stops working in jupyter notebooks.
Using suppress_import_stdout=False fixes the issue for me.

For reference, here is my output of show_versions:

System:
python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:40:08) [MSC v.1938 64 bit (AMD64)]
executable: c:\Users\user\Miniconda3\envs\timeseries\python.exe
machine: Windows-10-10.0.19045-SP0

Python dependencies:
pip: 24.0
sktime: 0.28.0
sklearn: 1.4.1.post1
skbase: 0.7.5
numpy: 1.26.4
scipy: 1.13.0
pandas: 2.2.1
matplotlib: 3.8.3
joblib: 1.3.2
numba: 0.59.1
statsmodels: 0.14.1
pmdarima: 2.0.4
statsforecast: 1.7.3
tsfresh: None
tslearn: None
torch: 2.2.2
tensorflow: None
tensorflow_probability: None
c:\Users\user\Miniconda3\envs\timeseries\lib\site-packages\statsforecast\core.py:26: TqdmExperimentalWarning: Using tqdm.autonotebook.tqdm in notebook mode. Use tqdm.tqdm instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
c:\Users\user\Miniconda3\envs\timeseries\lib\site-packages\statsforecast\utils.py:237: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
"ds": pd.date_range(start="1949-01-01", periods=len(AirPassengers), freq="M"),

For reference, here is my output of show_versions:

Thanks, @AxelJanRousseau.

Just to check, are the last lines also sth you get when using show_versions? I.e., the TqdmExperimentalWarning etc?

are the last lines also sth you get when using show_versions? I.e., the TqdmExperimentalWarning etc?

Those lines were part of the output from show_versions, and I included them just to make sure.

But I tried again just now in a clean notebook and the warnings disappeared, so it was probably related to the boatload of other imports I had open in my notebook

This is an skbase bug, moving to the repository.

Attempted fix here:
#328

Review and testing would be appreciated.