EI-CoreBioinformatics/portcullis

problem w/ python missing dependency in source installation

rdauria opened this issue · 3 comments

Hello,
I have built portcullis from source using python version 3.7.2 (installed in a non-default location and available via "module load python/3.7.2"). When running rule_filter.py either at the "make check" step or after installation I encounter the following error:

Executing python script with this command: portcullis/rule_filter.py portcullis/rule_filter.py --pos_json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer1.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer2.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer3.json --neg_json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer1.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer2.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer3.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer4.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer5.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer6.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer7.json --prefix=LIVER_CONV_Male_ZT02_1/junctions.selftrain.initialset LIVER_CONV_Male_ZT02_1/junctions.junctions.tab
Traceback (most recent call last):
  File "//home/tst/portcullis/1.1.2/share/portcullis/scripts/portcullis/rule_filter.py", line 8, in <module>
    from pandas import DataFrame
  File "/u/local/apps/python/3.7.2/lib/python3.7/site-packages/pandas/__init__.py", line 19, in <module>
    "Missing required dependencies {0}".format(missing_dependencies))
ImportError: Missing required dependencies ['numpy']

however numpy is installed and I can run the rule_filter.py with no problems if I call it directrly from:

/home/tst/portcullis/1.1.2/share/portcullis/scripts/portcullis/rule_filter.py --pos_json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer1.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer2.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer3.json --neg_json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer1.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer2.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer3.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer4.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer5.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer6.json //home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_neg.layer7.json --prefix=LIVER_CONV_Male_ZT02_1/junctions.selftrain.initialset LIVER_CONV_Male_ZT02_1/junctions.junctions.tab

in which case I get:

Loading input junctions ... /home/tst/portcullis/1.1.2/share/portcullis/scripts/portcullis/rule_filter.py:127: FutureWarning: from_csv is deprecated. Please use read_csv(...) instead. Note that some of the default arguments are different, so please refer to the documentation for from_csv when changing your function calls
  original = DataFrame.from_csv(args.input, sep='\t', header=0)
done. 219330 junctions loaded.

Creating initial positive set for training
------------------------------------------

Applying the following set of rule-based filters to create initial positive set.
1	//home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer1.json
2	//home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer2.json
3	//home/tst/portcullis/1.1.2/share/portcullis/balanced/selftrain_initial_pos.layer3.json

LAYER	PASS	FAIL
1	151689	67641
2	113720	105610
3	99949	119381
Intron size at L95 = 14921  positive set maximum intron size limit set to L95 x 1.2: 17905
4	95852	123478

Positive set contains: 95852 junctions

Saving positive set to disk ... done. File saved to: LIVER_CONV_Male_ZT02_1/junctions.selftrain.initialset.pos.junctions.tab
[...]

I have unsuccessfully tried to change set the PYTHONPATH.

Any help would be greatly appreciated,

Thanks,

RD

Dear @rdauria , thank you for reporting the bug.
I have just tried to reproduce it on my software building environment (module system: LMOD, which in the case of Python loads a conda environment; Python: 3.7.3) and I was able to launch the rule filtering and do the make check correctly.
I presume you loaded the same python you used for building, when doing these tests, am I correct?

Hi @rdauria, do you have docker on your system? If so can you try building the container locally from the develop branch, (I'll push the pre-built to container to dockerhub once the new version is ready). Then try running it using docker. To build type this:

docker build --tag local/portcullis:test .

Then to run type:

docker run -it --rm -v /path/to/data/:/data --name portcullis local/portcullis:test portcullis <your portcullis options and arguments go here>
bw2 commented

I ran into similar errors with rule_filter.py (in my case it was ImportError: Missing required dependencies ['pandas'])
For me, it turned out the order of the installation steps mattered.
The fix was to make sure I installed pandas and all python dependencies before installing libboost and portcullis (not just before running it).
This is my Dockerfile where I install portcullis inside a minideb base image, and which I'm now using to run portcullis:
https://github.com/macarthur-lab/rnaseq-methods/blob/master/pipelines/portcullis/docker/Dockerfile#L68-L69