brentp/combined-pvalues

--anno option not usable (Module compatibility issue)

Lei-Guo opened this issue · 5 comments

The --anno option is non-usable because module cruzdb is not compatible with comb-p.

Module cruzdb was designed for Python 2.7 while comb-p requires Python version greater than 3.5. (I tried to re-install comb-p under Python 2.7 but was warned that Python 3.5 or greater is needed.)

Below are my code and the error message:

comb-p pipeline -c 4 --seed 1e-3 --dist 200 -p fake --region-filter-p 0.1 --anno mm9 file.bed

from cruzdb import Genome
ModuleNotFoundError: No module named 'cruzdb'

I was able to force cruzdb to install by downloading it and manually running its "setup.py" file under Python 3.8, but it doesn't actually work, and instead gives the error:

  File "~/.conda/envs/comb-p/lib/python3.8/site-packages/cruzdb/sqlsoup.py", line 458
    except KeyError, ke:

(error message edited to replace homedir with ~)

Scrapping "except" is one of the many things that's changed in Python 3.

cruzdb is only required if you use --db, so just omit that and combined-pvalues should work for you.

Thanks @brentp !

When you say --db do you mean --anno ? I don't see a separate --db flag.

I also discovered you had a repo at https://github.com/brentp/cruzdb/ where you'd been working on making it work with Python 3.8 (the version I had forced to install was older).

I was able to install that version of cruzdb (after a minor edit to annotate.py to change "print args" to "print(args)"), after which the error when running comb-p changed to:

  File "~/.conda/envs/comb-p/lib/python3.8/site-packages/cpv-0.50.3-py3.8.egg/cpv/_common.py", line 56, in bediter
    for i, l in enumerate(ts.reader(fname, header=False)):
  File "~/.conda/envs/comb-p/lib/python3.8/site-packages/toolshed-0.4.6-py3.8.egg/toolshed/files.py", line 281, in reader
    for toks in line_gen:
  File "~/.conda/envs/comb-p/lib/python3.8/gzip.py", line 305, in read1
    return self._buffer.read1(size)
  File "~/.conda/envs/comb-p/lib/python3.8/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "~/.conda/envs/comb-p/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "~/.conda/envs/comb-p/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'\xe8:')

Thank you very much for continuing to work on these tools so people can continue to use comb-p! Wish I knew enough Python to help/contribute, but I'm not at that level yet!

hi, yes, I mean to run without --anno. It looks like you might have it working, but you have a bad gzipped file. Perhaps you can run on uncompressed input to verify.

Bit of an older thread, but commenting in case anyone else has been trying to run pipeline --anno on Python 3 as well:

Following up on @epikris 's thoughts (building cruzdb from current Git repo as of today, with the print args -> print(args) edit) still seems to still result in the Python 2 --> 3 compatibility issues that Brent & the cruzdb contributors have been trying to work through on cruzdb. So the bad gzip file may not be the only issue from above. Namely, think I'm getting the cruzdb basestring issues mentioned in #27 in Python 3.8.3.

wrote: testcombp.bed.regions-t.bed, (regions with region-p < 1.000 and n-probes >= 0: 52)
Traceback (most recent call last):
  File "/home/liucu/miniconda3/envs/py38/bin/comb-p", line 4, in <module>
    __import__('pkg_resources').run_script('cpv==0.50.4', 'comb-p')
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1463, in run_script
    exec(code, namespace, namespace)
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/cpv-0.50.4-py3.8.egg/EGG-INFO/scripts/comb-p", line 39, in <module>
    main()
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/cpv-0.50.4-py3.8.egg/EGG-INFO/scripts/comb-p", line 36, in main
    module.main()
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/cpv-0.50.4-py3.8.egg/cpv/pipeline.py", line 69, in main
    return pipeline(col_num, args.step,
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/cpv-0.50.4-py3.8.egg/cpv/pipeline.py", line 213, in pipeline
    g = Genome(db)
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/cruzdb/__init__.py", line 66, in __init__
    soup.Genome.__init__(self, self.dburl)
  File "/home/liucu/miniconda3/envs/py38/lib/python3.8/site-packages/cruzdb/sqlsoup.py", line 208, in __init__
    elif isinstance(engine_or_metadata, (basestring, Engine)):
NameError: name 'basestring' is not defined

But besides --anno, everything else with comb-p pipeline seems to be working well on my setup!