
Randomly crash when training

When I run every time, the process crashes intermittently, and I get the following error message:
File "/home/xky/RVT-master/", line 141, in main, ckpt_path=ckpt_path, datamodule=data_module)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/", line 603, in fit
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/", line 645, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/", line 1098, in _run
results = self._run_stage()
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/", line 1177, in _run_stage
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/trainer/", line 1200, in _run_train
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/", line 199, in run
self.advance(*args, **kwargs)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/", line 267, in advance
self._outputs =
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/", line 200, in run
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/", line 251, in on_advance_end
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/", line 310, in _run_validation
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/", line 199, in run
self.advance(*args, **kwargs)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/", line 152, in advance
dl_outputs =, dl_max_batches, kwargs)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/", line 199, in run
self.advance(*args, **kwargs)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/", line 121, in advance
batch = next(data_fetcher)
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/utilities/", line 184, in next
return self.fetching_function()
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/utilities/", line 258, in fetching_function
File "/home/xky/.local/lib/python3.9/site-packages/pytorch_lightning/utilities/", line 280, in _fetch_next_batch
batch = next(iterator)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/", line 634, in next
data = self._next_data()
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/", line 678, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/_utils/", line 41, in fetch
data = next(self.dataset_iter)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 144, in next
return self._get_next()
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 132, in _get_next
result = next(self.iterator)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 215, in wrap_next
result = next_func(*args, **kwargs)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 369, in next
return next(self._datapipe_iter)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 144, in next
return self._get_next()
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 132, in _get_next
result = next(self.iterator)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 185, in wrap_generator
response = gen.send(request)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/", line 589, in iter
yield from zip(*iterators)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 185, in wrap_generator
response = gen.send(request)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torchdata/datapipes/iter/util/", line 56, in iter
value = next(iterators[i])
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 185, in wrap_generator
response = gen.send(request)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/iter/", line 52, in iter
for data in dp:
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torch/utils/data/datapipes/", line 185, in wrap_generator
response = gen.send(request)
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/torchdata/datapipes/map/util/", line 47, in iter
yield self.datapipe[idx]
File "/home/xky/RVT-master/data/genx_utils/", line 152, in getitem
ev_repr = self._get_event_repr_torch(start_idx=start_idx, end_idx=end_idx)
File "/home/xky/RVT-master/data/genx_utils/", line 91, in _get_event_repr_torch
ev_repr = h5f['data'][start_idx:end_idx]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/xky/anaconda3/envs/rvt/lib/python3.9/site-packages/h5py/_hl/", line 768, in getitem
File "h5py/_selector.pyx", line 376, in
OSError: Can't read data (filter returned failure during read)
This exception is thrown by iter of MapToIterConverterIterDataPipe(datapipe=SequenceForIter, indices=range(0, 41))

There is a similar issue #10 , but I can't solve this question with any method mentioned there.

Hi @CocoYi-Claire

I have not encountered that issue myself but it might help if you can post the output of:

  • conda list
  • echo $HDF5_PLUGIN_PATH

Hi @magehrig

Thank you for your reply. Here are the outputs.

  • conda list

packages in environment at /home/xky/anaconda3/envs/rvt:

Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 1.4.0 pypi_0 pypi
aiohttp 3.8.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
ale-py 0.7.4 pypi_0 pypi
antlr-python-runtime 4.9.3 pyhd8ed1ab_1 conda-forge
appdirs 1.4.4 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
autorom 0.4.2 pypi_0 pypi
autorom-accept-rom-license 0.6.1 pypi_0 pypi
aws-c-auth 0.7.0 hbbaa140_3 conda-forge
aws-c-cal 0.6.0 h93469e0_0 conda-forge
aws-c-common 0.8.23 hd590300_0 conda-forge
aws-c-compression 0.2.17 h862ab75_1 conda-forge
aws-c-event-stream 0.3.1 h9599702_1 conda-forge
aws-c-http 0.7.11 hbe98c3e_0 conda-forge
aws-c-io 0.13.28 h3870b5a_0 conda-forge
aws-c-mqtt 0.9.0 h2e270ba_0 conda-forge
aws-c-s3 0.3.13 heb0bb06_2 conda-forge
aws-c-sdkutils 0.1.12 h862ab75_0 conda-forge
aws-checksums 0.1.16 h862ab75_1 conda-forge
aws-crt-cpp 0.21.0 h87b6960_2 conda-forge
aws-sdk-cpp 1.10.57 h7062fed_18 conda-forge
awscli 1.29.26 py39hf3d152e_0 conda-forge
bbox-visualizer 0.1.0 pypi_0 pypi
black 23.7.0 pypi_0 pypi
blas 1.0 mkl
blosc 1.21.3 h6a678d5_0
blosc-hdf5-plugin 1.0.0 h91a81c6_5 conda-forge
botocore 1.31.26 pyhd8ed1ab_0 conda-forge
bottleneck 1.3.5 py39h7deecbd_0
brotli 1.0.9 h5eee18b_7
brotli-bin 1.0.9 h5eee18b_7
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
c-ares 1.19.1 h5eee18b_0
ca-certificates 2023.08.22 h06a4308_0
cachetools 5.3.1 pypi_0 pypi
certifi 2023.7.22 py39h06a4308_0
cffi 1.15.1 py39h5eee18b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.0.4 pypi_0 pypi
cloudpickle 2.2.1 pypi_0 pypi
colorama 0.4.4 pyhd3eb1b0_0
contourpy 1.1.0 pypi_0 pypi
cryptography 41.0.3 py39hdda0065_0
cuda-cudart 11.8.89 0 nvidia
cuda-cupti 11.8.87 0 nvidia
cuda-libraries 11.8.0 0 nvidia
cuda-nvrtc 11.8.89 0 nvidia
cuda-nvtx 11.8.86 0 nvidia
cuda-runtime 11.8.0 0 nvidia
cycler 0.11.0 pyhd3eb1b0_0
detectron2 0.6 pypi_0 pypi
distlib 0.3.7 pypi_0 pypi
dm-tree 0.1.8 pypi_0 pypi
docker-pycreds 0.4.0 pypi_0 pypi
docutils 0.16 py39h06a4308_2
einops 0.6.0 pyhd8ed1ab_0 conda-forge
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.12.4 pypi_0 pypi
fonttools 4.42.0 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
frozenlist 1.4.0 pypi_0 pypi
fsspec 2023.6.0 pypi_0 pypi
future 0.18.3 pypi_0 pypi
fvcore 0.1.5.post20221221 pypi_0 pypi
giflib 5.2.1 h5eee18b_3
gitdb 4.0.10 pypi_0 pypi
gitpython 3.1.32 pypi_0 pypi
gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py39heeb90bb_0
gnutls 3.6.15 he1e5248_0
google-auth 2.22.0 pypi_0 pypi
google-auth-oauthlib 1.0.0 pypi_0 pypi
grpcio 1.57.0 pypi_0 pypi
gym 0.23.1 pypi_0 pypi
gym-notices 0.0.8 pypi_0 pypi
h5py 3.8.0 pypi_0 pypi
hdf5 1.14.0 nompi_hb72d44e_103 conda-forge
hdf5plugin 4.1.3 pypi_0 pypi
hydra-core 1.3.2 pyhd8ed1ab_0 conda-forge
idna 3.4 py39h06a4308_0
imageio 2.31.5 pypi_0 pypi
importlib-metadata 6.8.0 pypi_0 pypi
importlib_resources 5.2.0 pyhd3eb1b0_1
intel-openmp 2023.1.0 hdb19cb5_46305
iopath 0.1.9 pypi_0 pypi
jinja2 3.1.2 py39h06a4308_0
jmespath 0.10.0 pyhd3eb1b0_0
jpeg 9e h5eee18b_1
kiwisolver 1.4.4 py39h6a678d5_0
krb5 1.20.1 h143b758_1
lame 3.100 h7b6447c_0
lazy-loader 0.3 pypi_0 pypi
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libaec 1.0.6 hcb278e6_1 conda-forge
libbrotlicommon 1.0.9 h5eee18b_7
libbrotlidec 1.0.9 h5eee18b_7
libbrotlienc 1.0.9 h5eee18b_7
libcublas 0 nvidia
libcufft 0 nvidia
libcufile 0 nvidia
libcurand 0 nvidia
libcurl 8.2.1 h251f7ec_0
libcusolver 0 nvidia
libcusparse 0 nvidia
libdeflate 1.17 h5eee18b_1
libedit 3.1.20221030 h5eee18b_0
libev 4.33 h7f8727e_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libgfortran-ng 13.1.0 h69a702a_0 conda-forge
libgfortran5 13.1.0 h15d22d2_0 conda-forge
libgomp 13.1.0 he5830b7_0 conda-forge
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libllvm11 11.1.0 h9e868ea_6
libnghttp2 1.52.0 h2d74bed_1
libnpp 0 nvidia
libnsl 2.0.0 h5eee18b_0
libnvjpeg 0 nvidia
libpng 1.6.39 h5eee18b_0
libsqlite 3.42.0 h2797004_0 conda-forge
libssh2 1.10.0 hdbd6064_2
libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.1 h6a678d5_0
libunistring 0.9.10 h27cfd23_0
libuuid 2.38.1 h0b41bf4_0 conda-forge
libwebp 1.3.2 h11a3e52_0
libwebp-base 1.3.2 h5eee18b_0
libzlib 1.2.13 hd590300_5 conda-forge
lightning-utilities 0.9.0 pypi_0 pypi
llvmlite 0.38.0 py39h4ff587b_0
lz4 4.3.2 pypi_0 pypi
lz4-c 1.9.4 h6a678d5_0
markdown 3.4.4 pypi_0 pypi
markupsafe 2.1.1 py39h7f8727e_0
matplotlib-base 3.7.2 py39h1128e8f_0
memory-profiler 0.61.0 pypi_0 pypi
mkl 2023.1.0 h213fc3f_46343
mkl-service 2.4.0 py39h5eee18b_1
mkl_fft 1.3.8 py39h5eee18b_0
mkl_random 1.2.4 py39hdb19cb5_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpmath 1.3.0 py39h06a4308_0
msgpack 1.0.7 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
munkres 1.1.4 py_0
mypy-extensions 1.0.0 pypi_0 pypi
ncps 0.0.7 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nettle 3.7.3 hbbd107a_1
networkx 3.1 py39h06a4308_0
numba 0.55.1 py39h51133e4_0
numexpr 2.8.7 py39h85018f9_0
numpy 1.26.1 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
omegaconf 2.3.0 pyhd8ed1ab_0 conda-forge
opencv-python pypi_0 pypi
openh264 2.1.1 h4ff587b_0
openjpeg 2.4.0 h3ad879b_0
openssl 3.1.2 hd590300_0 conda-forge
packaging 23.1 py39h06a4308_0
pandas 1.5.3 pypi_0 pypi
pathspec 0.11.2 pypi_0 pypi
pathtools 0.1.2 pypi_0 pypi
pillow 10.0.1 py39ha6cbd5a_0
pip 23.2.1 pyhd8ed1ab_0 conda-forge
platformdirs 3.10.0 pypi_0 pypi
plotly 5.13.1 pypi_0 pypi
portalocker 2.3.0 py39h06a4308_1
protobuf 4.24.0 pypi_0 pypi
psutil 5.9.5 pypi_0 pypi
pyasn1 0.4.8 pyhd3eb1b0_0
pyasn1-modules 0.3.0 pypi_0 pypi
pycocotools 2.0.6 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 23.2.0 py39h06a4308_0
pyparsing 3.0.9 py39h06a4308_0
pysocks 1.7.1 py39h06a4308_0
python 3.9.17 h0755675_0_cpython conda-forge
python-dateutil 2.8.2 pyhd3eb1b0_0
python-tzdata 2023.3 pyhd3eb1b0_0
python_abi 3.9 3_cp39 conda-forge
pytorch 2.0.0 py3.9_cuda11.8_cudnn8.7.0_0 pytorch
pytorch-cuda 11.8 h7e8668a_5 pytorch
pytorch-lightning 1.8.6 pypi_0 pypi
pytorch-mutex 1.0 cuda pytorch
pytz 2023.3 pypi_0 pypi
pyyaml 6.0 py39h5eee18b_1
ray 2.1.0 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.31.0 py39h06a4308_0
requests-oauthlib 1.3.1 pypi_0 pypi
rpds-py 0.10.6 pypi_0 pypi
rsa 4.7.2 pyhd3eb1b0_1
s2n 1.3.46 h06160fa_0 conda-forge
s3transfer 0.6.0 py39h06a4308_0
scikit-image 0.22.0 pypi_0 pypi
scikit-video 1.1.11 pypi_0 pypi
scipy 1.11.2 pypi_0 pypi
seaborn 0.12.2 py39h06a4308_0
sentry-sdk 1.29.2 pypi_0 pypi
setproctitle 1.3.2 pypi_0 pypi
setuptools 68.0.0 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
smmap 5.0.0 pypi_0 pypi
strenum 0.4.10 pypi_0 pypi
sympy 1.12 pypyh9d50eac_103 conda-forge
tabulate 0.9.0 pypi_0 pypi
tbb 2021.8.0 hdb19cb5_0
tenacity 8.2.3 pypi_0 pypi
tensorboard 2.14.0 pypi_0 pypi
tensorboard-data-server 0.7.1 pypi_0 pypi
tensorboardx 2.6.2 pypi_0 pypi
termcolor 2.3.0 pypi_0 pypi
tifffile 2023.9.26 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tomli 2.0.1 pypi_0 pypi
torchdata 0.6.0 py39h6782a12_1 conda-forge
torchmetrics 1.0.3 pypi_0 pypi
torchtriton 2.0.0 py39 pytorch
torchvision 0.15.0 py39_cu118 pytorch
tqdm 4.66.1 pyhd8ed1ab_0 conda-forge
typing_extensions 4.7.1 py39h06a4308_0
tzdata 2023c h04d1e81_0
urllib3 1.26.16 py39h06a4308_0
virtualenv 20.24.5 pypi_0 pypi
wandb 0.14.0 pypi_0 pypi
werkzeug 2.3.7 pypi_0 pypi
wheel 0.41.2 py39h06a4308_0
xz 5.4.2 h5eee18b_0
yacs 0.1.8 pypi_0 pypi
yaml 0.2.5 h7b6447c_0
yarl 1.9.2 pypi_0 pypi
zipp 3.16.2 pypi_0 pypi
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.5 hc292b87_0


  • echo $HDF5_PLUGIN_PATH
    ( In 'rvt_metavision': hdf5's version is 1.14.0, hdf5plugin's version is 4.1.3 .)

your conda list is different from the one I get when I execute the readme install instructions.

  • you have h5py from pypi but I have h5py from conda-forge
  • you have hdf5plugin installed but I have not

Can you check whether you get the same errors when you set up a fresh conda env using the instructions in the readme without any additional installations?

Initially I set up my conda env exactly following the readme, but it failed with some packages. For example, I can't install h5py from conda-forge, that's why I use pip install.
Now, I've tried to use mamba to set up a new env. With the command mamba install -y h5py=3.8.0 blosc-hdf5-plugin=1.0.0 hydra-core=1.3.2 einops=0.6.0 torchdata=0.6.0 tqdm numba pytorch=2.0.0 torchvision=0.15.0 pytorch-cuda=$CUDA_VERSION -c pytorch -c nvidia and it turns out:

Looking for: ['h5py=3.8.0', 'blosc-hdf5-plugin=1.0.0', 'hydra-core=1.3.2', 'einops=0.6.0', 'torchdata=0.6.0', 'tqdm', 'numba', 'pytorch=2.0.0', 'torchvision=0.15.0', 'pytorch-cuda=11.8']

pytorch/linux-64 Using cache
pytorch/noarch Using cache
nvidia/linux-64 Using cache
nvidia/noarch Using cache
conda-forge/linux-64 Using cache
conda-forge/noarch Using cache

Pinned packages:

  • python 3.9.*

Could not solve for environment specs
The following packages are incompatible
├─ einops 0.6.0** does not exist (perhaps a typo or a missing channel);
├─ h5py 3.8.0** does not exist (perhaps a typo or a missing channel);
├─ hydra-core 1.3.2** does not exist (perhaps a typo or a missing channel);
└─ torchdata 0.6.0** is uninstallable because it requires
└─ openssl >=1.1.1t,<1.1.2a , which does not exist (perhaps a missing channel).

I'm not sure if conda channels are correct, or maybe I've set something wrong in the config. Here are my conda config:

add_anaconda_token: True
add_pip_as_python_dependency: True
allow_conda_downgrades: False
allow_cycles: True
allow_non_channel_urls: False
allow_softlinks: False
allowlist_channels: []
always_copy: False
always_softlink: False
always_yes: None
anaconda_upload: None
auto_activate_base: True
auto_stack: 0
auto_update_conda: True
changeps1: True
channel_priority: flexible
channel_settings: []
client_ssl_cert: None
client_ssl_cert_key: None
clobber: False
conda_build: {}
create_default_packages: []
croot: /home/xky/mambaforge/conda-bld
debug: False
default_python: 3.10
default_threads: None
deps_modifier: not_set
dev: False
disallowed_packages: []
download_only: False
dry_run: False
enable_private_envs: False
env_prompt: ({default_env})
execute_threads: 1
experimental: []
extra_safety_checks: False
fetch_threads: 5
force: False
force_32bit: False
force_reinstall: False
force_remove: False
ignore_pinned: False
json: False
local_repodata_ttl: 1
migrated_channel_aliases: []
migrated_custom_channels: {}
non_admin_enabled: True
notify_outdated_conda: True
number_channel_notices: 5
offline: False
override_channels_enabled: True
path_conflict: clobber
pinned_packages: []
pip_interop_enabled: False
proxy_servers: {}
quiet: False
remote_backoff_factor: 1
remote_connect_timeout_secs: 9.15
remote_max_retries: 3
remote_read_timeout_secs: 60.0
repodata_threads: None
report_errors: None
restore_free_channel: False
rollback_enabled: True
root_prefix: /home/xky/mambaforge
safety_checks: warn
sat_solver: pycosat
separate_format_cache: False
shortcuts: True
show_channel_urls: None
signing_metadata_url_base: None
solver: classic
solver_ignore_timestamps: False
ssl_verify: True
subdir: linux-64

  • inux-64
    track_features: []
    unsatisfiable_hints: True
    unsatisfiable_hints_check_depth: 2
    update_modifier: update_specs
    use_index_cache: False
    use_local: False
    use_only_tar_bz2: True
    verbosity: 0
    verify_threads: 1

Thank you for your patience!

I have also encountered problems about h5py, it has been a little long, and I can't remember it very clearly.
I installed the hdf5plugin, which I set up in and to solve this problem