dkoslicki/CMash

Python3 hanging

Closed this issue · 2 comments

See this issue with Metalign. In short,
So I figured out the probably source of the program hanging: python3 vs python2 for CMash:
After setting up the data, this hangs:

python3 -m venv VE3
source VE3/bin/activate
.\setup_libraries.sh

This hangs:

python3 metalign.py test/RL_S001__insert_270_1M_subset.fq data/ --output test/RL_S001__insert_270_1M_subset_results.tsv

this is the thing causing the hang:

python3 StreamingQueryDNADatabase.py ../../data/r7aqo9zw/60mers_intersection_dump.fa ../../data/cmash_db_n1000_k60.h5 ../../test/CMash_out.csv 30-60-10 -c 0 -r 10000 -v -f ../../data/cmash_filter_n1000_k60_30-60-10.bf --sensitive
So instead, try python2, and it doesn't hang:

virtualenv VE2
source VE2/bin/activate
cd CMash
pip install -r requirements.txt

this runs just fine and does not hang:

python StreamingQueryDNADatabase.py ../../data/r7aqo9zw/60mers_intersection_dump.fa ../../data/cmash_db_n1000_k60.h5 ../../test/CMash_out.csv 30-60-10 -c 0 -r 10000 -v -f ../../data/cmash_filter_n1000_k60_30-60-10.bf --sensitive

this works too (oddly enough, since it's being called with python3, so it only looks like installing CMash with python2 is required):

python3 metalign.py test/RL_S001__insert_270_1M_subset.fq data/ --output test/RL_S001__insert_270_1M_subset_results.tsv

also works (as it appears metalign.py is python2/3 compliant):

python metalign.py test/RL_S001__insert_270_1M_subset.fq data/ --output test/RL_S001__insert_270_1M_subset_results.tsv
So possible solutions (with my assessment of ease of implementation):

Make setup_libraries.sh use python2 when installing CMash (probably via a virtualenv) (easy)
See why marisa-trie isn't working with python3 (their repo says it's python3 compatible) (medium)
Refactor CMash so it's python3 compliant (hard)

Appeared to be a problem with chunksize not being an integer, as well as file names being strings in python2 while byte literals in python3.

Appears to be fixed now