leylabmpi/resmico

Resmico tests fail and more

Opened this issue · 2 comments

(1) ./resmico/tests/ from GitHub repo fail with "cannot import name 'reader' from 'resmico'":

(resmico-1.2.2) [appbuild@ai-submit2](master|…)» pytest -s --hide-run-results --script-launch-mode=subprocess /nethome/appbuild/eb_files/ResMiCo/
resmico/resmico/tests
============================================================== test session starts ==============================================================
platform linux -- Python 3.8.16, pytest-7.3.1, pluggy-1.0.0
rootdir: /nethome/appbuild/eb_files/ResMiCo/resmico/resmico
plugins: console-scripts-1.4.0
collected 10 items / 2 errors

==================================================================== ERRORS =====================================================================
_________________________________________________ ERROR collecting tests/test_contig_reader.py __________________________________________________
ImportError while importing test module '/nethome/appbuild/eb_files/ResMiCo/resmico/resmico/tests/test_contig_reader.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
../resmico/tests/test_contig_reader.py:5: in <module>
    from resmico import contig_reader
/nethome/appbuild/eb_files/ResMiCo/resmico/resmico/contig_reader.py:16: in <module>
    from resmico import reader
E   ImportError: cannot import name 'reader' from 'resmico' (/nethome/appbuild/eb_files/ResMiCo/resmico/resmico/__init__.py)
___________________________________________________ ERROR collecting tests/test_models_fl.py ____________________________________________________
ImportError while importing test module '/nethome/appbuild/eb_files/ResMiCo/resmico/resmico/tests/test_models_fl.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
../resmico/tests/test_models_fl.py:7: in <module>
    from resmico import models_fl
/nethome/appbuild/eb_files/ResMiCo/resmico/resmico/models_fl.py:21: in <module>
    from resmico.contig_reader import ContigReader
/nethome/appbuild/eb_files/ResMiCo/resmico/resmico/contig_reader.py:16: in <module>
    from resmico import reader
E   ImportError: cannot import name 'reader' from 'resmico' (/nethome/appbuild/eb_files/ResMiCo/resmico/resmico/__init__.py)
=============================================================== warnings summary ================================================================
../../../../software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:205
  /sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:205: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
    np.bool8: (False, True),

../../../../software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/flatbuffers/compat.py:19
  /sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/flatbuffers/compat.py:19: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

../../../../software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:326
  /sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:326: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
    np.bool8: (False, True),

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================ short test summary info ============================================================
ERROR ../../../../../../nethome/appbuild/eb_files/ResMiCo/resmico/resmico/tests/test_contig_reader.py
ERROR ../../../../../../nethome/appbuild/eb_files/ResMiCo/resmico/resmico/tests/test_models_fl.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================= 3 warnings, 2 errors in 3.40s =========================================================

(2) AttributeError: module 'numpy' has no attribute 'bool'.

File "/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/resmico/models_fl.py", line 896, in __getitem__
    return (x, mask), np.zeros(batch_size, dtype=np.bool)

  File "/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])

AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Solution: Changed 3 lines with np.bool to just bool in file models_fl.py:

vi /sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/resmico/models_fl.py

mask = np.ones((self.batch_size, self.convoluted_size(max_len, True)), dtype=bool)
#mask = np.ones((self.batch_size, self.convoluted_size(max_len, True)), dtype=np.bool)

mask = np.zeros((batch_size, self.convoluted_size(max_len, pad=True)), dtype=bool)
#mask = np.zeros((batch_size, self.convoluted_size(max_len, pad=True)), dtype=np.bool)

return (x, mask), np.zeros(batch_size, dtype=bool)
#return (x, mask), np.zeros(batch_size, dtype=np.bool)

(3) Bad file descriptor error:

#28

(resmico-1.2.2) [appbuild@ai-submit2](master|…)» resmico evaluate \                                         ~/eb_files/ResMiCo/resmico/example1
  --min-avg-coverage 0.01 \
  --save-path predictions \
  --save-name default-model \
  --feature-files-path features

Namespace(batch_size=300, emb_ind=0, emb_num=10000, embeddings=False, feature_file_match='', feature_files_path='features', features=['num_query_A', 'num_query_C', 'num_query_G', 'num_query_T', 'mean_mapq_Match', 'stdev_al_score_Match', 'mean_al_score_Match', 'mean_insert_size_Match', 'coverage', 'min_al_score_Match', 'num_SNPs', 'min_insert_size_Match', 'num_proper_Match', 'num_orphans_Match'], func=<function main at 0x2b87f1b7c700>, gpu_eval_mem_gb=1.0, log_level='INFO', max_len=20000, min_avg_coverage=0.01, min_contig_len=1000, model='/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/resmico/model/resmico.h5', n_procs=1, no_cython=False, save_name='default-model', save_path='predictions', seed=12, stats_file='/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/resmico/model/stats_cov.json', val_ind_f=None, verify_insert_size=False)

2023-06-01 21:12:13.973162: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 21:12:13.977225: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
2023-06-01 21:12:13,977 - There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
2023-06-01 21:12:13,983 - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
2023-06-01 21:12:13,983 - Number of devices: 1
2023-06-01 21:12:13,984 - Loading model: /sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/resmico/model/resmico.h5
2023-06-01 21:12:15,691 - Model loaded
2023-06-01 21:12:15,691 - Loading contig data...
2023-06-01 21:12:15,691 - Looking for stats/toc files...
2023-06-01 21:12:15,692 - Processing 3 stats/toc files found in features ...
2023-06-01 21:12:15,692 - Loading feature means and standard deviations from /sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/resmico/model/stats_cov.json
2023-06-01 21:12:15,694 - Found 350 contigs, 0 misassembled, 89 excluded, 5840494 total length, 12513 median length, memory needed (assuming fraq-neg=1)   0.36GB
2023-06-01 21:12:15,694 - Breakpoint location histogram: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2023-06-01 21:12:15,694 - Breakpoint relative position histogram: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2023-06-01 21:12:15,694 - Using all indices for prediction
2023-06-01 21:12:15,695 - Creating evaluation data generator. Window: 20000, Step: 19500, Caching: False
2023-06-01 21:12:15.802013: W tensorflow/core/framework/dataset.cc:768] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
Evaluating: [####################################################################################################] 1/1 Done...
/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/sklearn/metrics/_ranking.py:891: UserWarning: No positive class found in y_true, recall is set to one for all thresholds.
  warnings.warn(
/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
2023-06-01 21:12:31,486 - Prediction scores: aucPR: -0.0 - recall1: 0.0 - recall0: 0.0
2023-06-01 21:12:31,486 - Prediction done in 16s.
2023-06-01 21:12:31,489 - Predictions saved to: predictions/default-model.csv
Exception ignored in: <function Pool.__del__ at 0x2b87d6daeca0>
Traceback (most recent call last):
  File "/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

Solution: "this error should not affect the predictions" #28

Please let me know if you have a solution for (1). How critical are (2) and (3)? Please let me know. Thank you!

Hi @ponomarevsy,

Thank you for your interest in using resmico and reporting issues!

(1) can you please give details on your machine, and how you created an environment and installed resmico? this error happens because cython extension was not compiled. Please have a look at the different installation options https://github.com/leylabmpi/resmico#installation

(2) this error is fixed in one of the latest commits exactly in the way you described
46ca984

(3) is not critical, it doesn't affect model predictions and saving of the results

Hi Olga! Thanks for getting back to me. Actually, I used mamba to install resmico (since conda could NOT resolve all the dependencies well; I should have used pip instead):

Installed Resmico 1.2.2 with Mamba under resmico-1.2.2 environment name:
 
conda create -n resmico-1.2.2 python=3.8
source activate resmico-1.2.2
conda install mamba
mamba install resmico=1.2.2 (should I use 'pip install resmico==1.2.2' instead)
mamba install pytest pytest-console-scripts
 
Installed Samtools:
conda install samtools

(resmico-1.2.2) [appbuild@ai-submit2](master|…)» conda list resmico
# packages in environment at /sysapps/cluster/software/Anaconda3/2022.05/envs/resmico-1.2.2:
#
# Name                    Version                   Build  Channel
resmico                   1.2.2            py38h5cf8b27_1    bioconda

What would you suggest I do now, remove mamba installation and install via pip or start from scratch? Please let me know.