Scratch/tmp pod5 problem
Macdot3 opened this issue · 21 comments
Hi everyone,
I tried to install pod5 from the conda channel by @JannesSP because I don't have access to corporate permissions from pip. Subsequently, after installing and loading all the plugins, when I launch the --one-to-one command I receive this error:
Converting 22 Fast5s: 0%| | 0/88000 [00:00<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmpjccgpwm2/lib)
I can't find any folder with this name, do I need to install something? they are temp files.
I hope you can help me. Thank you
Hi @Macdot3,
Can you share the full command that you run please?
Kind regards,
Rich
`#!/bin/bash
#SBATCH --job-name=fast5_conv
#SBATCH --mem=64GB # amout of RAM in MB required (and max ram available).
##SBATCH --mem-per-cpu=5000 # amount of ram per Core (see ntasks, if you ask for ntasks
#SBATCH --time=INFINITE ## OR #SBATCH --time=10:00 means 10 minutes OR --time=01:00:00 means 1 hour
#SBATCH --cpus-per-task=10 # number of required cores
#SBATCH --nodes=1 # not really useful for not mpi jobs
##SBATCH --partition=work ##work is the default and unique queue, you do not need to specify.
#SBATCH --error="/home/barresi.m/Nanopore/Dorado/Dorado_ERR/fast5_conv.err"
#SBATCH --output="/home/barresi.m/Nanopore/Dorado/Dorado_OUT/fast5_conv.out"
source /opt/common/tools/besta/miniconda3/bin/activate
conda activate /home/barresi.m/Nanopore/Dorado/POD5_ENV
pod5 convert fast5 /home/barresi.m/Nanopore/Dorado/FAST5/barcode5/*.fast5 \
--output /home/barresi.m/Nanopore/Dorado/POD5/POD5_barcode5/ \
--one-to-one /home/barresi.m/Nanopore/Dorado/FAST5/barcode5/ -t 10`
From your command I don't see any issues or any reason why it would need to access /tmp
as any temp files created by pod5 are written locally.
Could you please add debugging to the pod5 command with POD5_DEBUG=1 pod5 convert ....
and share the log files that are generated?
I added
export POD5_DEBUG=1
to the previous command pod5 convert and saved logfile.txt. and this is the result:
`Converting 22 Fast5s: 0%| | 0/88000 [00:07<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmp2w9ypgxb/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:07<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmprh5stj_2/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:07<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmp_picg_t3/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:07<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmp0799lli3/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:07<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmp8t2d7mn7/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:07<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmp1at9xm89/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:08<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmp2w9ypgxb/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:08<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmp_dmao20f/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:08<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmprh5stj_2/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:08<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmprm70eucb/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:08<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmpc9nszjth/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:08<?, ?Reads/s]
Can't read data (can't open directory: /scratch/tmp/tmpzz07dc_1/lib)
Converting 22 Fast5s: 0%| | 0/88000 [00:09<?, ?Reads/s]
Converting 22 Fast5s: 0%| | 0/88000 [00:09<?, ?Reads/s]
`
POD5_DEBUG will have generated .log
files in the working directory - could you share those please?
This error can occur from HDF5 not finding the plugin to open the fast5 files.
Can you please ensure that you have vbz_h5py_plugin
installed in the python environment?
Kind regards,
Rich
# packages in environment at /home/barresi.m/Nanopore/Dorado/POD5_ENV:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
aws-c-auth 0.7.14 h70caa3e_3 conda-forge
aws-c-cal 0.6.9 h14ec70c_3 conda-forge
aws-c-common 0.9.12 hd590300_0 conda-forge
aws-c-compression 0.2.17 h572eabf_8 conda-forge
aws-c-event-stream 0.4.1 h17cd1f3_5 conda-forge
aws-c-http 0.8.0 hc6da83f_5 conda-forge
aws-c-io 0.14.3 h3c8c088_1 conda-forge
aws-c-mqtt 0.10.1 h0ef3971_3 conda-forge
aws-c-s3 0.5.0 hb337f33_1 conda-forge
aws-c-sdkutils 0.1.14 h572eabf_0 conda-forge
aws-checksums 0.1.17 h572eabf_7 conda-forge
aws-crt-cpp 0.26.1 h0637f07_8 conda-forge
aws-sdk-cpp 1.11.242 h65f022c_0 conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.26.0 hd590300_0 conda-forge
ca-certificates 2024.2.2 hbcca054_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
glog 0.6.0 h6f12383_0 conda-forge
h5py 3.9.0 py312h34c39bb_0
hdf5 1.12.1 nompi_h4df4325_104 conda-forge
icu 73.2 h59595ed_0 conda-forge
iso8601 2.1.0 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
lib-pod5 0.3.6 py312_0 jannessp
libabseil 20230802.1 cxx17_h59595ed_0 conda-forge
libarrow 15.0.0 he2c5238_2_cpu conda-forge
libarrow-acero 15.0.0 h59595ed_2_cpu conda-forge
libarrow-dataset 15.0.0 h59595ed_2_cpu conda-forge
libarrow-flight 15.0.0 hdc44a87_2_cpu conda-forge
libarrow-flight-sql 15.0.0 hfbc7f12_2_cpu conda-forge
libarrow-gandiva 15.0.0 hacb8726_2_cpu conda-forge
libarrow-substrait 15.0.0 hfbc7f12_2_cpu conda-forge
libblas 3.9.0 21_linux64_openblas conda-forge
libbrotlicommon 1.1.0 hd590300_1 conda-forge
libbrotlidec 1.1.0 hd590300_1 conda-forge
libbrotlienc 1.1.0 hd590300_1 conda-forge
libcblas 3.9.0 21_linux64_openblas conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcurl 8.5.0 hca28451_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libevent 2.1.12 hf998b51_1 conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.2.0 h807b86a_5 conda-forge
libgfortran-ng 13.2.0 h69a702a_5 conda-forge
libgfortran5 13.2.0 ha4646dd_5 conda-forge
libgomp 13.2.0 h807b86a_5 conda-forge
libgoogle-cloud 2.12.0 hef10d8f_5 conda-forge
libgrpc 1.60.0 h74775cd_1 conda-forge
libiconv 1.17 hd590300_2 conda-forge
liblapack 3.9.0 21_linux64_openblas conda-forge
libllvm15 15.0.7 hb3ce162_4 conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnl 3.9.0 hd590300_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libnuma 2.0.16 h0b41bf4_1 conda-forge
libopenblas 0.3.26 pthreads_h413a1c8_0 conda-forge
libparquet 15.0.0 h352af49_2_cpu conda-forge
libprotobuf 4.25.1 hf27288f_1 conda-forge
libre2-11 2023.06.02 h7a70373_0 conda-forge
libsqlite 3.44.2 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge
libthrift 0.19.0 hb90f79a_1 conda-forge
libutf8proc 2.8.0 h166bdaf_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.12.5 h232c23b_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
more-itertools 10.2.0 pyhd8ed1ab_0 conda-forge
ncurses 6.4 h59595ed_2 conda-forge
numpy 1.26.3 py312heda63a1_0 conda-forge
ont_vbz_hdf_plugin 1.0.1 hb6da537_4 bioconda
openssl 3.2.1 hd590300_0 conda-forge
orc 1.9.2 h7829240_1 conda-forge
packaging 23.2 pyhd8ed1ab_0 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pod5 0.3.6 py312_0 jannessp
polars 0.20.7 py312hfa2e56e_0 conda-forge
pyarrow 15.0.0 py312h176e3d2_2_cpu conda-forge
python 3.12.1 hab00c5b_1_cpython conda-forge
python_abi 3.12 4_cp312 conda-forge
pytz 2024.1 pyhd8ed1ab_0 conda-forge
rdma-core 50.0 hd3aeb46_0 conda-forge
re2 2023.06.02 h2873b5e_0 conda-forge
readline 8.2 h8228510_1 conda-forge
s2n 1.4.3 h06160fa_0 conda-forge
setuptools 69.0.3 pyhd8ed1ab_0 conda-forge
snappy 1.1.10 h9fff704_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tqdm 4.66.1 pyhd8ed1ab_0 conda-forge
tzdata 2024a h0c530f3_0 conda-forge
ucx 1.15.0 h75e419f_3 conda-forge
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.5 hfc55251_0 conda-forge
The log files show that this could be a problem with HDF5
File "/home/barresi.m/Nanopore/Dorado/POD5_ENV/lib/python3.12/site-packages/pod5/tools/pod5_convert_from_fast5.py", line 540, in convert_fast5_read
signal = raw["Signal"][()]
~~~~~~~~~~~~~^^^^
As for the plugin:
ont_vbz_hdf_plugin 1.0.1 hb6da537_4 bioconda
is not the same as the vbz_h5py_plugin in the pod5 dependencies.
Can you run pip list
please?
(/home/barresi.m/Nanopore/Dorado/POD5_ENV) barresi.m@login01:~$ pip list
DEPRECATION: Loading egg at /home/barresi.m/Nanopore/Dorado/POD5_ENV/lib/python3.12/site-packages/vbz_h5py_plugin-1.0.1-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
Package Version
--------------- -------
colorama 0.4.6
h5py 3.9.0
iso8601 2.1.0
lib_pod5 0.3.6
more-itertools 10.2.0
numpy 1.26.3
packaging 23.2
pip 24.0
pod5 0.3.6
polars 0.20.7
pyarrow 15.0.0
pytz 2024.1
setuptools 69.0.3
tqdm 4.66.1
vbz_h5py_plugin 1.0.1
vbz_h5py_plugin 1.0.1
vbz_h5py_plugin 1.0.1
wheel 0.42.0
Hi @Macdot3,
This could be an issue with the vbz_h5py_plugin
. We're seeing the deprecation warning about .egg files at the top and the 3 duplicate lines is also an issue - but it does seem to be installed.
Could you try this script in your environment - it's the __init__.py
from the vbz_h5py_plugin
module with some print statements.
I suspect that print(f"{lib_path=}")
will be lib_path=/scratch/tmp/<tmp>/lib
import sys
def get_vbz_resource_path() -> str:
"""Get the path to the vbz plugin (lib) resource"""
vbz_package = "vbz_h5py_plugin"
vbz_target = "lib"
# importlib.resources superseeded pkg_resources from python3.9+
if sys.version_info.major == 3 and sys.version_info.minor > 8:
import importlib.resources
vbz_lib = importlib.resources.files(vbz_package) / vbz_target
with importlib.resources.as_file(vbz_lib) as path:
return str(path.absolute())
else:
import pkg_resources
return pkg_resources.resource_filename(vbz_package, vbz_target)
def register_plugin() -> str:
"""Register the vbz hdf plugins with h5py"""
lib_path = get_vbz_resource_path()
try:
# Add the vbz library path to the h5 plugin search paths
from h5py import h5pl
h5pl.prepend(bytes(lib_path, "UTF-8"))
print(f"{lib_path=}")
except (ImportError, AttributeError):
# We don't have the plugin library in h5py<2.10 so we fall
# back on an environment variable
import os
os.environ["HDF5_PLUGIN_PATH"] = lib_path
print(f"{os.environ["HDF5_PLUGIN_PATH"]=}")
return lib_path
register_plugin()
I opened the /POD5_ENV/vbz_h5py_plugin-1.0.1/build/lib/vbz_h5py_plugin/init.py script and at the bottom the line relating to
print(f"{os.environ["HDF5_PLUGIN_PATH"]=}")
is missing.
What should I replace or add?
Here the script:
`""" vbz_hdf_plugin imported at module import-time"""
# pylint: disable=E1101,C0415
import sys
def get_vbz_resource_path() -> str:
"""Get the path to the vbz plugin (lib) resource"""
vbz_package = "vbz_h5py_plugin"
vbz_target = "lib"
# importlib.resources superseeded pkg_resources from python3.9+
if sys.version_info.major == 3 and sys.version_info.minor > 8:
import importlib.resources
vbz_lib = importlib.resources.files(vbz_package) / vbz_target
with importlib.resources.as_file(vbz_lib) as path:
return str(path.absolute())
else:
import pkg_resources
return pkg_resources.resource_filename(vbz_package, vbz_target)
def register_plugin() -> str:
"""Register the vbz hdf plugins with h5py"""
lib_path = get_vbz_resource_path()
try:
# Add the vbz library path to the h5 plugin search paths
from h5py import h5pl
h5pl.prepend(bytes(lib_path, "UTF-8"))
except (ImportError, AttributeError):
# We don't have the plugin library in h5py<2.10 so we fall
# back on an environment variable
import os
os.environ["HDF5_PLUGIN_PATH"] = lib_path
return lib_path
register_plugin()
`
I wrote a copy of it above to run as a script - that copy should be good to go if you write it to a file e.g. test_h5py.py
and run it
You're right, I run your script and as results:
(/home/barresi.m/Nanopore/Dorado/POD5_ENV) barresi.m@login01:~$ python /home/barresi.m/Nanopore/Dorado/POD5_ENV/vbz_h5py_plugin-1.0.1/test_h5py.py install
lib_path='/scratch/tmp/tmpg90ofdww/lib'
Now, should I change the last path with the ine the script gives me?
This is telling us that the HDF5 library can't load the vbz decode library in your Slurm cluster environment.
I'm intrigued by this. Maybe it's my lack of experience with conda but why is the lib being loaded from a temporary directory instead of where the site-packages are installed in the conda environment?
For example, when I run the script from my python venv I get:
/.../venv/lib/python3.10/site-packages/vbz_h5py_plugin/lib
Does your conda environment work locally i.e. not in the slurm cluster?
You might be able to run the script locally get the real path and set with:
export HDF5_PLUGIN_PATH=</path/site-packages/vbz_h5py_plugin/lib>
The cluster network administrators have created a python environment in which I cannot install any packages for business reasons. When I tried they denied me access. Therefore I chose to create a conda environment from my cluster folder to install pod5. However, once I did this, it initially didn't let me run the package commands because the vbz_h5py_plugin-1.0.1 plugin was missing. For the same reason as above, I downloaded the .tar.gz file locally and installed it in my environment and ran the script.
Now my script has this path /POD5_ENV/vbz_h5py_plugin-1.0.1/build/lib/vbz_h5py_plugin/__init__.py.
To this I added print(f"{os.environ["HDF5_PLUGIN_PATH"]=}")
but I get no results. Maybe I need to run test_h5py.py
? And then do the export path?
Hi @Macdot3, you don't need to edit vbz_h5py_plugin/__init__.py.
. Just run the test_h5py.py
script that I sent which should print the lib location. This should tell you where to set the path.
Alternatively, you could ask the administrators to download and install the vbz plugin manually
which should add the plugin to the correct localtion https://github.com/nanoporetech/vbz_compression?tab=readme-ov-file
Thanks @HalfPhoton, I followed what you wrote to me. The only problem is that I noticed that it gives me a different path every time. I think the only way is to ask the administrators if they will reply to me
The only problem is that I noticed that it gives me a different path every time.
Are you running the script locally - i.e. not on the slurm cluster
Does your conda environment work locally i.e. not in the slurm cluster?
You might be able to run the script locally get the real path and set with: ...
Of course
Ok, I suggest asking an administrator to install the vbx plugin. Let us know if that helps