problem with STRING file
adelomana opened this issue · 5 comments
I may have a problem reading the STRING file? I've checked that the file is not empty...
Thanks!
Gates 2016-08-15 14:37:14 ~/gDrive2/projects/TREES-C/PfuEGRIN/src/cmonkeyRuns : cmonkey2 --organism pfu --verbose /Users/alomana/gDrive2/projects/TREES-C/PfuEGRIN/data/expression/ratios/test.txt
2016-08-15 14:37:21 INFO checking MEME...
/bin/uname: Command not found.
/bin/awk: Command not found.
2016-08-15 14:37:21 INFO Input matrix has # rows: 3, # columns: 3
2016-08-15 14:37:21 INFO # clusters/row: 2
2016-08-15 14:37:21 INFO # clusters/column: 0
2016-08-15 14:37:21 INFO # CLUSTERS: 0
2016-08-15 14:37:21 INFO use operons: 1
2016-08-15 14:37:21 INFO using MEME version 4.3.0
2016-08-15 14:37:21 DEBUG creating aux directories
2016-08-15 14:37:21 DEBUG created output database schema
2016-08-15 14:37:21 DEBUG added row and column names to output database
2016-08-15 14:37:21 DEBUG creating aux directories
2016-08-15 14:37:21 INFO attempting automatic download of operons from Microbes Online
2016-08-15 14:37:21 INFO NCBI CODE IS: 186497
2016-08-15 14:37:21 INFO Automatically using STRING file in 'cache/186497.gz' (URL: http://networks.systemsbiology.net/string9/186497.gz)
2016-08-15 14:37:21 DEBUG adding operon network factory
2016-08-15 14:37:21 DEBUG Creating Microbe object for 'pfu'
2016-08-15 14:37:21 DEBUG RSAT - get_directory()
2016-08-15 14:37:34 WARNING can't find the correct RSAT mapping !
2016-08-15 14:37:34 INFO KEGG = 'Pyrococcus furiosus DSM 3638' -> RSAT = 'Pyrococcus_furiosus'
2016-08-15 14:37:34 INFO Creating networks...
2016-08-15 14:37:34 INFO stringdb.read_edges2()
2016-08-15 14:37:35 INFO Finished loading cache/186497.gz
2016-08-15 14:37:36 WARNING 2057 (out of 375528) nodes not found in synonyms
2016-08-15 14:37:36 INFO stringdb.read_edges2(), 0 edges read, 187764 edges ignored
2016-08-15 14:37:36 DEBUG Network.create() called with 0 edges
2016-08-15 14:37:36 DEBUG # nodes in network 'STRING': 0 (of 0)
Traceback (most recent call last):
File "/Users/alomana/anaconda/bin/cmonkey2", line 36, in
cmonkey_run.run()
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/cmonkey_run.py", line 512, in run
self.prepare_run()
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/cmonkey_run.py", line 474, in prepare_run
thesaurus = self.organism().thesaurus()
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/cmonkey_run.py", line 231, in organism
self.organism = self.make_organism()
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/cmonkey_run.py", line 341, in make_organism
self['fasta_file'])
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/organism.py", line 244, in __init
fasta_file)
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/organism.py", line 117, in init
OrganismBase.init(self, code, network_factories, ratios=ratios)
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/organism.py", line 72, in init
self.__networks.append(make_network(self, ratios))
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/stringdb.py", line 135, in make_network
organism, ratios)
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/network.py", line 150, in create
raise Exception("Error: only %d edges in network '%s'" % (len(network_edges), name))
Exception: Error: only 0 edges in network 'STRING'
Gates 2016-08-15 14:37:36 ~/gDrive2/projects/TREES-C/PfuEGRIN/src/cmonkeyRuns :
Could you please check if the names match up ? This error occurs when you have gene names and they don't match up directly or indirectly (through the synonyms) with the STRING names.
For example when the case is not right, like mixed-case, upper case lower case... or anything of the sort
My example ratios file has 6 genes. I made sure all of them are at the STRING file, same IDs.
files .tab and feature names are empty, that could be the cause. P furiosus in not in RSAT...
nope. I manually created the files, but still have the same error...
Gates 2016-08-15 18:10:52 ~/gDrive2/projects/TREES-C/PfuEGRIN/src/cmonkeyRuns : head -n 20 cache/Pyrococcus_furiosus*
==> cache/Pyrococcus_furiosus.tab <==
-- dump date 20160815
-- class Genbank::Organism
-- table organism
-- table main
-- field 1 id
-- field 2 taxonomy
-- field 3 source
-- header
-- id taxonomy source
2261 Archaea; Euryarchaeota; Thermococci; Thermococcales; Thermococcaceae; Pyrococcus Genbank
==> cache/Pyrococcus_furiosus_feature_names <==
id type contig start_pos end_pos strand description
PF0001 CDS chr1 94 927 R Function Code: 16.1 Conserved Hypothetical
PF0002 CDS chr1 939 2243 R Function Code: 16.1 Conserved Hypothetical
PF0003 CDS chr1 2240 3340 R hypothetical protein
PF0004 CDS chr1 3353 4126 R Function Code: 16.1 Conserved Hypothetical
PF0005 CDS chr1 4123 4926 R ABC transporter, ATP-binding protein
PF0006 CDS chr1 5308 6300 D ABC transporter
PF0007 CDS chr1 6464 7951 D Function Code: 16.1 Conserved Hypothetical
PF0008 CDS chr1 8054 8548 D FIG049476: HIT family protein
PF0009 CDS chr1 8538 9230 D Molybdopterin-synthase adenylyltransferase (EC 2.7.7.80)
PF0010 CDS chr1 9916 10905 D pyridoxal phosphate-dependent deaminase, putative
PF0011 CDS chr1 10956 11750 R Uncharacterized protein MJ0440
PF0012 CDS chr1 11810 12613 D FIG00996186: hypothetical protein
PF0013 CDS chr1 12573 13322 R Mobile element protein
PF0014 CDS chr1 13350 13736 R Uncharacterized protein MJ0105
PF0015 CDS chr1 13751 14479 R COG2047: Uncharacterized protein (ATP-grasp superfamily)
PF0016 CDS chr1 14581 15354 D 5'-methylthioadenosine phosphorylase (EC 2.4.2.28)
PF0017 CDS chr1 16236 17498 D Origin of replication recognition protein @ Cell division control protein 6
PF0018 CDS chr1 17498 19339 D Archaeal DNA polymerase II small subunit (EC 2.7.7.7)
PF0019 CDS chr1 19339 23130 D Archaeal DNA polymerase II large subunit (EC 2.7.7.7)
Gates 2016-08-15 18:11:13 ~/gDrive2/projects/TREES-C/PfuEGRIN/src/cmonkeyRuns : cmonkey2 --organism pfu --verbose --num_cores 4 --checkratios /Users/alomana/gDrive2/projects/TREES-C/PfuEGRIN/data/expression/ratios/test.txt
2016-08-15 18:11:34 INFO checking MEME...
/bin/uname: Command not found.
/bin/awk: Command not found.
2016-08-15 18:11:34 INFO Input matrix has # rows: 5, # columns: 3
2016-08-15 18:11:34 INFO # clusters/row: 2
2016-08-15 18:11:34 INFO # clusters/column: 0
2016-08-15 18:11:34 INFO # CLUSTERS: 0
2016-08-15 18:11:34 INFO use operons: 1
2016-08-15 18:11:34 INFO using MEME version 4.3.0
2016-08-15 18:11:34 DEBUG creating aux directories
2016-08-15 18:11:34 INFO attempting automatic download of operons from Microbes Online
2016-08-15 18:11:34 INFO NCBI CODE IS: 186497
2016-08-15 18:11:34 INFO Automatically using STRING file in 'cache/186497.gz' (URL: http://networks.systemsbiology.net/string9/186497.gz)
2016-08-15 18:11:34 DEBUG adding operon network factory
2016-08-15 18:11:34 DEBUG Creating Microbe object for 'pfu'
2016-08-15 18:11:34 DEBUG RSAT - get_directory()
2016-08-15 18:11:47 WARNING can't find the correct RSAT mapping !
2016-08-15 18:11:47 INFO KEGG = 'Pyrococcus furiosus DSM 3638' -> RSAT = 'Pyrococcus_furiosus'
2016-08-15 18:11:47 INFO Creating networks...
2016-08-15 18:11:47 INFO stringdb.read_edges2()
2016-08-15 18:11:48 INFO Finished loading cache/186497.gz
2016-08-15 18:11:48 WARNING 2057 (out of 375528) nodes not found in synonyms
2016-08-15 18:11:48 INFO stringdb.read_edges2(), 0 edges read, 187764 edges ignored
2016-08-15 18:11:48 DEBUG Network.create() called with 0 edges
2016-08-15 18:11:48 DEBUG # nodes in network 'STRING': 0 (of 0)
Traceback (most recent call last):
File "/Users/alomana/anaconda/bin/cmonkey2", line 24, in
thesaurus = cmonkey_run.organism().thesaurus()
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/cmonkey_run.py", line 231, in organism
self.organism = self.make_organism()
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/cmonkey_run.py", line 341, in make_organism
self['fasta_file'])
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/organism.py", line 244, in __init
fasta_file)
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/organism.py", line 117, in init
OrganismBase.init(self, code, network_factories, ratios=ratios)
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/organism.py", line 72, in init
self.__networks.append(make_network(self, ratios))
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/stringdb.py", line 135, in make_network
organism, ratios)
File "/Users/alomana/anaconda/lib/python3.5/site-packages/cmonkey/network.py", line 150, in create
raise Exception("Error: only %d edges in network '%s'" % (len(network_edges), name))
Exception: Error: only 0 edges in network 'STRING'
Gates 2016-08-15 18:11:49 ~/gDrive2/projects/TREES-C/PfuEGRIN/src/cmonkeyRuns : head -n 20 cache/Pyrococcus_furiosus*
==> cache/Pyrococcus_furiosus.tab <==
-- dump date 20160815
-- class Genbank::Organism
-- table organism
-- table main
-- field 1 id
-- field 2 taxonomy
-- field 3 source
-- header
-- id taxonomy source
2261 Archaea; Euryarchaeota; Thermococci; Thermococcales; Thermococcaceae; Pyrococcus Genbank
==> cache/Pyrococcus_furiosus_feature_names <==
id type contig start_pos end_pos strand description
PF0001 CDS chr1 94 927 R Function Code: 16.1 Conserved Hypothetical
PF0002 CDS chr1 939 2243 R Function Code: 16.1 Conserved Hypothetical
PF0003 CDS chr1 2240 3340 R hypothetical protein
PF0004 CDS chr1 3353 4126 R Function Code: 16.1 Conserved Hypothetical
PF0005 CDS chr1 4123 4926 R ABC transporter, ATP-binding protein
PF0006 CDS chr1 5308 6300 D ABC transporter
PF0007 CDS chr1 6464 7951 D Function Code: 16.1 Conserved Hypothetical
PF0008 CDS chr1 8054 8548 D FIG049476: HIT family protein
PF0009 CDS chr1 8538 9230 D Molybdopterin-synthase adenylyltransferase (EC 2.7.7.80)
PF0010 CDS chr1 9916 10905 D pyridoxal phosphate-dependent deaminase, putative
PF0011 CDS chr1 10956 11750 R Uncharacterized protein MJ0440
PF0012 CDS chr1 11810 12613 D FIG00996186: hypothetical protein
PF0013 CDS chr1 12573 13322 R Mobile element protein
PF0014 CDS chr1 13350 13736 R Uncharacterized protein MJ0105
PF0015 CDS chr1 13751 14479 R COG2047: Uncharacterized protein (ATP-grasp superfamily)
PF0016 CDS chr1 14581 15354 D 5'-methylthioadenosine phosphorylase (EC 2.4.2.28)
PF0017 CDS chr1 16236 17498 D Origin of replication recognition protein @ Cell division control protein 6
PF0018 CDS chr1 17498 19339 D Archaeal DNA polymerase II small subunit (EC 2.7.7.7)
PF0019 CDS chr1 19339 23130 D Archaeal DNA polymerase II large subunit (EC 2.7.7.7)
Gates 2016-08-15 18:13:01 ~/gDrive2/projects/TREES-C/PfuEGRIN/src/cmonkeyRuns :
now files have the right content, all good, thanks Wei-Ju!