[BUG] Keras's incompatibility with `numpy>=2` breaks `cellfinder`'s model training

Question

[BUG] Keras's incompatibility with `numpy>=2` breaks `cellfinder`'s model training

alessandrofelder opened this issue 3 months ago · 4 comments

Describe the bug

When I try to train a model with cellfinder napari's Training widget, I get a keras-related error:

AttributeError: `np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.

Which is likely because of a reported incompatibility between keras and numpy 2.

Full stack trace

File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:613, in create_worker.<locals>.reraise(e=AttributeError('`np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.'))
    612 def reraise(e):
--> 613     raise e
        e = AttributeError('`np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.')

File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:175, in WorkerBase.run(self=<napari._qt.qthreading.FunctionWorker object>)
    173     warnings.filterwarnings("always")
    174     warnings.showwarning = lambda *w: self.warned.emit(w)
--> 175     result = self.work()
        self = <napari._qt.qthreading.FunctionWorker object at 0x74b3e0e87f40>
    176 if isinstance(result, Exception):
    177     if isinstance(result, RuntimeError):
    178         # The Worker object has likely been deleted.
    179         # A deleted wrapped C/C++ object may result in a runtime
    180         # error that will cause segfault if we try to do much other
    181         # than simply notify the user.

File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:354, in FunctionWorker.work(self=<napari._qt.qthreading.FunctionWorker object>)
    353 def work(self) -> _R:
--> 354     return self._func(*self._args, **self._kwargs)
        self._func = <function run_training at 0x74b4a889f740>
        self = <napari._qt.qthreading.FunctionWorker object at 0x74b3e0e87f40>
        self._args = (TrainingDataInputs(yaml_files=(PosixPath('/home/alessandro/dev/training.yml'),), output_directory=PosixPath('/home/alessandro')), OptionalNetworkInputs(trained_model=None, model_weights=None, model_depth='50', pretrained_model='resnet50_tv'), OptionalTrainingInputs(continue_training=False, augment=True, tensorboard=False, save_weights=False, save_checkpoints=True, save_progress=True, epochs=100, learning_rate=0.0001, batch_size=16, test_fraction=0.1), MiscTrainingInputs(number_of_free_cpus=2))
        self._kwargs = {}

File ~/dev/cellfinder/cellfinder/napari/train/train.py:29, in run_training(training_data_inputs=TrainingDataInputs(yaml_files=(PosixPath('/home/..., output_directory=PosixPath('/home/alessandro')), optional_network_inputs=OptionalNetworkInputs(trained_model=None, model_...model_depth='50', pretrained_model='resnet50_tv'), optional_training_inputs=OptionalTrainingInputs(continue_training=False, ...ng_rate=0.0001, batch_size=16, test_fraction=0.1), misc_training_inputs=MiscTrainingInputs(number_of_free_cpus=2))
     21 @thread_worker
     22 def run_training(
     23     training_data_inputs: TrainingDataInputs,
   (...)
     26     misc_training_inputs: MiscTrainingInputs,
     27 ):
     28     print("Running training")
---> 29     train_yml(
        train_yml = <function run at 0x74b517751800>
        training_data_inputs = TrainingDataInputs(yaml_files=(PosixPath('/home/alessandro/dev/training.yml'),), output_directory=PosixPath('/home/alessandro'))
        optional_network_inputs = OptionalNetworkInputs(trained_model=None, model_weights=None, model_depth='50', pretrained_model='resnet50_tv')
        optional_training_inputs = OptionalTrainingInputs(continue_training=False, augment=True, tensorboard=False, save_weights=False, save_checkpoints=True, save_progress=True, epochs=100, learning_rate=0.0001, batch_size=16, test_fraction=0.1)
        misc_training_inputs = MiscTrainingInputs(number_of_free_cpus=2)
     30         **training_data_inputs.as_core_arguments(),
     31         **optional_network_inputs.as_core_arguments(),
     32         **optional_training_inputs.as_core_arguments(),
     33         **misc_training_inputs.as_core_arguments(),
     34     )
     35     print("Finished!")

File ~/dev/cellfinder/cellfinder/core/train/train_yml.py:431, in run(output_dir=PosixPath('/home/alessandro'), yaml_file=(PosixPath('/home/alessandro/dev/training.yml'),), n_free_cpus=2, trained_model=None, model_weights=PosixPath('/home/alessandro/.brainglobe/cellfinder/models/resnet50_tv.h5'), install_path=PosixPath('/home/alessandro/.brainglobe/cellfinder/models'), model=<Functional name=functional, built=True>, network_depth='50', learning_rate=0.0001, continue_training=False, test_fraction=0.1, batch_size=16, no_augment=False, tensorboard=False, save_weights=False, no_save_checkpoints=False, save_progress=True, epochs=100)
    426     else:
    427         filepath = str(
    428             output_dir / ("model" + base_checkpoint_file_name + ".keras")
    429         )
--> 431     checkpoints = ModelCheckpoint(
        filepath = '/home/alessandro/model-epoch.{epoch:02d}-loss-{val_loss:.3f}.keras'
        save_weights = False
    432         filepath,
    433         save_weights_only=save_weights,
    434     )
    435     callbacks.append(checkpoints)
    437 if save_progress:

File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/keras/src/callbacks/model_checkpoint.py:173, in ModelCheckpoint.__init__(self=<keras.src.callbacks.model_checkpoint.ModelCheckpoint object>, filepath='/home/alessandro/model-epoch.{epoch:02d}-loss-{val_loss:.3f}.keras', monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', initial_value_threshold=None)
    171         self.monitor_op = np.less
    172         if self.best is None:
--> 173             self.best = np.Inf
        self.best = None
        self = <keras.src.callbacks.model_checkpoint.ModelCheckpoint object at 0x74b3c00a95d0>
        np = <module 'numpy' from '/home/alessandro/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/numpy/__init__.py'>
    175 if self.save_freq != "epoch" and not isinstance(self.save_freq, int):
    176     raise ValueError(
    177         f"Unrecognized save_freq: {self.save_freq}. "
    178         "Expected save_freq are 'epoch' or integer values"
    179     )

File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/numpy/__init__.py:397, in __getattr__(attr='Inf')
    394     raise AttributeError(__former_attrs__[attr])
    396 if attr in __expired_attributes__:
--> 397     raise AttributeError(
        attr = 'Inf'
        __expired_attributes__ = {'geterrobj': 'Use the np.errstate context manager instead.', 'seterrobj': 'Use the np.errstate context manager instead.', 'cast': 'Use `np.asarray(arr, dtype=dtype)` instead.', 'source': 'Use `inspect.getsource` instead.', 'lookfor': "Search NumPy's documentation directly.", 'who': 'Use an IDE variable explorer or `locals()` instead.', 'fastCopyAndTranspose': 'Use `arr.T.copy()` instead.', 'set_numeric_ops': 'For the general case, use `PyUFunc_ReplaceLoopBySignature`. For ndarray subclasses, define the ``__array_ufunc__`` method and override the relevant ufunc.', 'NINF': 'Use `-np.inf` instead.', 'PINF': 'Use `np.inf` instead.', 'NZERO': 'Use `-0.0` instead.', 'PZERO': 'Use `0.0` instead.', 'add_newdoc': "It's still available as `np.lib.add_newdoc`.", 'add_docstring': "It's still available as `np.lib.add_docstring`.", 'add_newdoc_ufunc': "It's an internal function and doesn't have a replacement.", 'compat': "There's no replacement, as Python 2 is no longer supported.", 'safe_eval': 'Use `ast.literal_eval` instead.', 'float_': 'Use `np.float64` instead.', 'complex_': 'Use `np.complex128` instead.', 'longfloat': 'Use `np.longdouble` instead.', 'singlecomplex': 'Use `np.complex64` instead.', 'cfloat': 'Use `np.complex128` instead.', 'longcomplex': 'Use `np.clongdouble` instead.', 'clongfloat': 'Use `np.clongdouble` instead.', 'string_': 'Use `np.bytes_` instead.', 'unicode_': 'Use `np.str_` instead.', 'Inf': 'Use `np.inf` instead.', 'Infinity': 'Use `np.inf` instead.', 'NaN': 'Use `np.nan` instead.', 'infty': 'Use `np.inf` instead.', 'issctype': 'Use `issubclass(rep, np.generic)` instead.', 'maximum_sctype': 'Use a specific dtype instead. You should avoid relying on any implicit mechanism and select the largest dtype of a kind explicitly in the code.', 'obj2sctype': 'Use `np.dtype(obj).type` instead.', 'sctype2char': 'Use `np.dtype(obj).char` instead.', 'sctypes': 'Access dtypes explicitly instead.', 'issubsctype': 'Use `np.issubdtype` instead.', 'set_string_function': 'Use `np.set_printoptions` instead with a formatter for custom printing of NumPy objects.', 'asfarray': 'Use `np.asarray` with a proper dtype instead.', 'issubclass_': 'Use `issubclass` builtin instead.', 'tracemalloc_domain': "It's now available from `np.lib`.", 'mat': 'Use `np.asmatrix` instead.', 'recfromcsv': 'Use `np.genfromtxt` with comma delimiter instead.', 'recfromtxt': 'Use `np.genfromtxt` instead.', 'deprecate': 'Emit `DeprecationWarning` with `warnings.warn` directly, or use `typing.deprecated`.', 'deprecate_with_doc': 'Emit `DeprecationWarning` with `warnings.warn` directly, or use `typing.deprecated`.', 'disp': 'Use your own printing function instead.', 'find_common_type': 'Use `numpy.promote_types` or `numpy.result_type` instead. To achieve semantics for the `scalar_types` argument, use `numpy.result_type` and pass the Python values `0`, `0.0`, or `0j`.', 'round_': 'Use `np.round` instead.', 'get_array_wrap': '', 'DataSource': "It's still available as `np.lib.npyio.DataSource`.", 'nbytes': 'Use `np.dtype(<dtype>).itemsize` instead.', 'byte_bounds': "Now it's available under `np.lib.array_utils.byte_bounds`", 'compare_chararrays': "It's still available as `np.char.compare_chararrays`.", 'format_parser': "It's still available as `np.rec.format_parser`."}
        __expired_attributes__[attr] = 'Use `np.inf` instead.'
    398         f"`np.{attr}` was removed in the NumPy 2.0 release. "
    399         f"{__expired_attributes__[attr]}"
    400     )
    402 if attr == "chararray":
    403     warnings.warn(
    404         "`np.chararray` is deprecated and will be removed from "
    405         "the main namespace in the future. Use an array with a string "
    406         "or bytes dtype instead.", DeprecationWarning, stacklevel=2)

AttributeError: `np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.

To Reproduce

Clean conda env
Install cellfinder
Open napari and the cellfinder training widget
Pass it a YAML file with some training data
Hit the run button.

Expected behaviour
I can train cellfinder through napari

Log file

\

Screenshots

\

Computer used (please complete the following information):

Ubuntu 22.04
Dell Desktop

Additional context

I can make this go away by pip install "numpy<2"

Answer 1 · 2024-08-07T10:45:33.000Z

Should we pin to NumPy < 2.0 for now?

Answer 2 · 2024-08-07T10:56:05.000Z

yes, a PR is in progress - will ask for your review shortly 😁

Answer 3 · 2024-08-16T14:07:52.000Z

This is now fixed in keras-team/keras#20049 and released as part of 3.5.0. I tested it locally and training proceeds without errors with numpy==2.0.1 and keras==3.5.0. We can now unpin numpy, but perhaps pin keras>=3.5.0?

Answer 4 · 2024-08-22T10:08:53.000Z

Need to wait for torch 2.4.1 to unpin numpy Windows.