[BUG] Keras's incompatibility with `numpy>=2` breaks `cellfinder`'s model training
alessandrofelder opened this issue · 4 comments
Describe the bug
When I try to train a model with cellfinder napari's Training widget, I get a keras-related error:
AttributeError: `np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.
Which is likely because of a reported incompatibility between keras
and numpy 2
.
Full stack trace
File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:613, in create_worker.<locals>.reraise(e=AttributeError('`np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.'))
612 def reraise(e):
--> 613 raise e
e = AttributeError('`np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.')
File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:175, in WorkerBase.run(self=<napari._qt.qthreading.FunctionWorker object>)
173 warnings.filterwarnings("always")
174 warnings.showwarning = lambda *w: self.warned.emit(w)
--> 175 result = self.work()
self = <napari._qt.qthreading.FunctionWorker object at 0x74b3e0e87f40>
176 if isinstance(result, Exception):
177 if isinstance(result, RuntimeError):
178 # The Worker object has likely been deleted.
179 # A deleted wrapped C/C++ object may result in a runtime
180 # error that will cause segfault if we try to do much other
181 # than simply notify the user.
File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:354, in FunctionWorker.work(self=<napari._qt.qthreading.FunctionWorker object>)
353 def work(self) -> _R:
--> 354 return self._func(*self._args, **self._kwargs)
self._func = <function run_training at 0x74b4a889f740>
self = <napari._qt.qthreading.FunctionWorker object at 0x74b3e0e87f40>
self._args = (TrainingDataInputs(yaml_files=(PosixPath('/home/alessandro/dev/training.yml'),), output_directory=PosixPath('/home/alessandro')), OptionalNetworkInputs(trained_model=None, model_weights=None, model_depth='50', pretrained_model='resnet50_tv'), OptionalTrainingInputs(continue_training=False, augment=True, tensorboard=False, save_weights=False, save_checkpoints=True, save_progress=True, epochs=100, learning_rate=0.0001, batch_size=16, test_fraction=0.1), MiscTrainingInputs(number_of_free_cpus=2))
self._kwargs = {}
File ~/dev/cellfinder/cellfinder/napari/train/train.py:29, in run_training(training_data_inputs=TrainingDataInputs(yaml_files=(PosixPath('/home/..., output_directory=PosixPath('/home/alessandro')), optional_network_inputs=OptionalNetworkInputs(trained_model=None, model_...model_depth='50', pretrained_model='resnet50_tv'), optional_training_inputs=OptionalTrainingInputs(continue_training=False, ...ng_rate=0.0001, batch_size=16, test_fraction=0.1), misc_training_inputs=MiscTrainingInputs(number_of_free_cpus=2))
21 @thread_worker
22 def run_training(
23 training_data_inputs: TrainingDataInputs,
(...)
26 misc_training_inputs: MiscTrainingInputs,
27 ):
28 print("Running training")
---> 29 train_yml(
train_yml = <function run at 0x74b517751800>
training_data_inputs = TrainingDataInputs(yaml_files=(PosixPath('/home/alessandro/dev/training.yml'),), output_directory=PosixPath('/home/alessandro'))
optional_network_inputs = OptionalNetworkInputs(trained_model=None, model_weights=None, model_depth='50', pretrained_model='resnet50_tv')
optional_training_inputs = OptionalTrainingInputs(continue_training=False, augment=True, tensorboard=False, save_weights=False, save_checkpoints=True, save_progress=True, epochs=100, learning_rate=0.0001, batch_size=16, test_fraction=0.1)
misc_training_inputs = MiscTrainingInputs(number_of_free_cpus=2)
30 **training_data_inputs.as_core_arguments(),
31 **optional_network_inputs.as_core_arguments(),
32 **optional_training_inputs.as_core_arguments(),
33 **misc_training_inputs.as_core_arguments(),
34 )
35 print("Finished!")
File ~/dev/cellfinder/cellfinder/core/train/train_yml.py:431, in run(output_dir=PosixPath('/home/alessandro'), yaml_file=(PosixPath('/home/alessandro/dev/training.yml'),), n_free_cpus=2, trained_model=None, model_weights=PosixPath('/home/alessandro/.brainglobe/cellfinder/models/resnet50_tv.h5'), install_path=PosixPath('/home/alessandro/.brainglobe/cellfinder/models'), model=<Functional name=functional, built=True>, network_depth='50', learning_rate=0.0001, continue_training=False, test_fraction=0.1, batch_size=16, no_augment=False, tensorboard=False, save_weights=False, no_save_checkpoints=False, save_progress=True, epochs=100)
426 else:
427 filepath = str(
428 output_dir / ("model" + base_checkpoint_file_name + ".keras")
429 )
--> 431 checkpoints = ModelCheckpoint(
filepath = '/home/alessandro/model-epoch.{epoch:02d}-loss-{val_loss:.3f}.keras'
save_weights = False
432 filepath,
433 save_weights_only=save_weights,
434 )
435 callbacks.append(checkpoints)
437 if save_progress:
File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/keras/src/callbacks/model_checkpoint.py:173, in ModelCheckpoint.__init__(self=<keras.src.callbacks.model_checkpoint.ModelCheckpoint object>, filepath='/home/alessandro/model-epoch.{epoch:02d}-loss-{val_loss:.3f}.keras', monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', initial_value_threshold=None)
171 self.monitor_op = np.less
172 if self.best is None:
--> 173 self.best = np.Inf
self.best = None
self = <keras.src.callbacks.model_checkpoint.ModelCheckpoint object at 0x74b3c00a95d0>
np = <module 'numpy' from '/home/alessandro/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/numpy/__init__.py'>
175 if self.save_freq != "epoch" and not isinstance(self.save_freq, int):
176 raise ValueError(
177 f"Unrecognized save_freq: {self.save_freq}. "
178 "Expected save_freq are 'epoch' or integer values"
179 )
File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/numpy/__init__.py:397, in __getattr__(attr='Inf')
394 raise AttributeError(__former_attrs__[attr])
396 if attr in __expired_attributes__:
--> 397 raise AttributeError(
attr = 'Inf'
__expired_attributes__ = {'geterrobj': 'Use the np.errstate context manager instead.', 'seterrobj': 'Use the np.errstate context manager instead.', 'cast': 'Use `np.asarray(arr, dtype=dtype)` instead.', 'source': 'Use `inspect.getsource` instead.', 'lookfor': "Search NumPy's documentation directly.", 'who': 'Use an IDE variable explorer or `locals()` instead.', 'fastCopyAndTranspose': 'Use `arr.T.copy()` instead.', 'set_numeric_ops': 'For the general case, use `PyUFunc_ReplaceLoopBySignature`. For ndarray subclasses, define the ``__array_ufunc__`` method and override the relevant ufunc.', 'NINF': 'Use `-np.inf` instead.', 'PINF': 'Use `np.inf` instead.', 'NZERO': 'Use `-0.0` instead.', 'PZERO': 'Use `0.0` instead.', 'add_newdoc': "It's still available as `np.lib.add_newdoc`.", 'add_docstring': "It's still available as `np.lib.add_docstring`.", 'add_newdoc_ufunc': "It's an internal function and doesn't have a replacement.", 'compat': "There's no replacement, as Python 2 is no longer supported.", 'safe_eval': 'Use `ast.literal_eval` instead.', 'float_': 'Use `np.float64` instead.', 'complex_': 'Use `np.complex128` instead.', 'longfloat': 'Use `np.longdouble` instead.', 'singlecomplex': 'Use `np.complex64` instead.', 'cfloat': 'Use `np.complex128` instead.', 'longcomplex': 'Use `np.clongdouble` instead.', 'clongfloat': 'Use `np.clongdouble` instead.', 'string_': 'Use `np.bytes_` instead.', 'unicode_': 'Use `np.str_` instead.', 'Inf': 'Use `np.inf` instead.', 'Infinity': 'Use `np.inf` instead.', 'NaN': 'Use `np.nan` instead.', 'infty': 'Use `np.inf` instead.', 'issctype': 'Use `issubclass(rep, np.generic)` instead.', 'maximum_sctype': 'Use a specific dtype instead. You should avoid relying on any implicit mechanism and select the largest dtype of a kind explicitly in the code.', 'obj2sctype': 'Use `np.dtype(obj).type` instead.', 'sctype2char': 'Use `np.dtype(obj).char` instead.', 'sctypes': 'Access dtypes explicitly instead.', 'issubsctype': 'Use `np.issubdtype` instead.', 'set_string_function': 'Use `np.set_printoptions` instead with a formatter for custom printing of NumPy objects.', 'asfarray': 'Use `np.asarray` with a proper dtype instead.', 'issubclass_': 'Use `issubclass` builtin instead.', 'tracemalloc_domain': "It's now available from `np.lib`.", 'mat': 'Use `np.asmatrix` instead.', 'recfromcsv': 'Use `np.genfromtxt` with comma delimiter instead.', 'recfromtxt': 'Use `np.genfromtxt` instead.', 'deprecate': 'Emit `DeprecationWarning` with `warnings.warn` directly, or use `typing.deprecated`.', 'deprecate_with_doc': 'Emit `DeprecationWarning` with `warnings.warn` directly, or use `typing.deprecated`.', 'disp': 'Use your own printing function instead.', 'find_common_type': 'Use `numpy.promote_types` or `numpy.result_type` instead. To achieve semantics for the `scalar_types` argument, use `numpy.result_type` and pass the Python values `0`, `0.0`, or `0j`.', 'round_': 'Use `np.round` instead.', 'get_array_wrap': '', 'DataSource': "It's still available as `np.lib.npyio.DataSource`.", 'nbytes': 'Use `np.dtype(<dtype>).itemsize` instead.', 'byte_bounds': "Now it's available under `np.lib.array_utils.byte_bounds`", 'compare_chararrays': "It's still available as `np.char.compare_chararrays`.", 'format_parser': "It's still available as `np.rec.format_parser`."}
__expired_attributes__[attr] = 'Use `np.inf` instead.'
398 f"`np.{attr}` was removed in the NumPy 2.0 release. "
399 f"{__expired_attributes__[attr]}"
400 )
402 if attr == "chararray":
403 warnings.warn(
404 "`np.chararray` is deprecated and will be removed from "
405 "the main namespace in the future. Use an array with a string "
406 "or bytes dtype instead.", DeprecationWarning, stacklevel=2)
AttributeError: `np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.
To Reproduce
- Clean conda env
- Install cellfinder
- Open napari and the cellfinder training widget
- Pass it a YAML file with some training data
- Hit the run button.
Expected behaviour
I can train cellfinder through napari
Log file
\
Screenshots
\
Computer used (please complete the following information):
- Ubuntu 22.04
- Dell Desktop
Additional context
I can make this go away by pip install "numpy<2"
Should we pin to NumPy < 2.0 for now?
yes, a PR is in progress - will ask for your review shortly 😁
This is now fixed in keras-team/keras#20049 and released as part of 3.5.0
. I tested it locally and training proceeds without errors with numpy==2.0.1
and keras==3.5.0
. We can now unpin numpy
, but perhaps pin keras>=3.5.0
?
Need to wait for torch
2.4.1 to unpin numpy
Windows.