securefederatedai/openfl

[Windows] `torchvision=0.18.1` incompatible with `NumPy 2.x` leading to error initializing torch workspaces

Opened this issue · 1 comments

A while back NumPy release v2.x which resulted in issues when using packages compiled for NumPy 1.x
One area where openfl was specifically affected was in the torch-workspaces [Ref Issue #999]

While updating the workspaces to torch==2.3.1 and torchvision==0.18.1 seemed to work on ubuntu, it seems that Windows torchvision==0.18.1 is still incompatible with NumPy v2.x resulting in errors initializing torch workspaces on Windows

We should look into updating torch and torchvision to later versions to compatible across Ubuntu and Windows

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<stdin>", line 1, in <module>
  File "C:\Documents\openfl\venv\lib\site-packages\torchvision\__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "C:\Documents\openfl\venv\lib\site-packages\torchvision\models\__init__.py", line 2, in <module>
    from .convnext import *
  File "C:\Documents\openfl\venv\lib\site-packages\torchvision\models\convnext.py", line 8, in <module>
    from ..ops.misc import Conv2dNormActivation, Permute
  File "C:\Documents\openfl\venv\lib\site-packages\torchvision\ops\__init__.py", line 23, in <module>
    from .poolers import MultiScaleRoIAlign
  File "C:\Documents\openfl\venv\lib\site-packages\torchvision\ops\poolers.py", line 10, in <module>
    from .roi_align import roi_align
  File "C:\Documents\openfl\venv\lib\site-packages\torchvision\ops\roi_align.py", line 4, in <module>
    import torch._dynamo
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\__init__.py", line 64, in <module>
    torch.manual_seed = disable(torch.manual_seed)
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\decorators.py", line 50, in disable
    return DisableContext()(fn)
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\eval_frame.py", line 410, in __call__
    (filename is None or trace_rules.check(fn))
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\trace_rules.py", line 3378, in check
    return check_verbose(obj, is_inlined_call).skipped
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\trace_rules.py", line 3361, in check_verbose
    rule = torch._dynamo.trace_rules.lookup_inner(
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\trace_rules.py", line 3442, in lookup_inner
    rule = get_torch_obj_rule_map().get(obj, None)
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\trace_rules.py", line 2782, in get_torch_obj_rule_map
    obj = load_object(k)
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\trace_rules.py", line 2811, in load_object
    val = _load_obj_from_str(x[0])
  File "C:\Documents\openfl\venv\lib\site-packages\torch\_dynamo\trace_rules.py", line 2795, in _load_obj_from_str
    return getattr(importlib.import_module(module), obj_name)
  File "C:\AppData\Local\Programs\Python\Python310\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Documents\openfl\venv\lib\site-packages\torch\nested\_internal\nested_tensor.py", line 417, in <module>
    values=torch.randn(3, 3, device="meta"),
C:\Documents\openfl\venv\lib\site-packages\torch\nested\_internal\nested_tensor.py:417: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
  values=torch.randn(3, 3, device="meta"),

To Reproduce
Steps to reproduce the behavior:

  1. Install OpenFL
  2. fx workspace create --template torch_cnn_mnist --prefix my_workspace
  3. fx plan initialize
  4. See error

Expected behavior
workspaces should initialize without error

The suggested fix is to upgrade OpenFL to use the latest PyTorch 2.x version across the board (TaskRunner hierarchy, FL workspaces, workflow API examples). This will also likely address the issue observed in Windows.