SchulzLab/TEPIC

Testcases with "-f example_annotation.gtf" fail

JulianRein opened this issue · 6 comments

It seems all test cases with that option fail with the following (first) error:
Error: Type checker found wrong number of fields while tokenizing data line.
Perhaps you have extra TAB at the end of your line? Check with "cat -t"

I could not fix that by changing the example_annotations.gtf, but maybe I dont understand the error. Same result if I use Toy_Annotation.gtf instead.
Any idea why this happens?
Tried python 2.7 and 3.7
testOutput.txt

The error seems to be originating in bedtools. According to the output you provided (thanks!) all the test cases with the "reduced peak set" option fail.

Which bedtools version are you using? Could you manually check whether the bedtools intersect operation on your system works ? Which Linux version are you using? Or is it a mac?

I have just tested again on my system as well as on the server, there are no issues there.

Thanks for the fast reply!
bedtoolsIntersectOutput.txt

I tested bedtools intersect on the test files. Toy-bed on itself works, Toy-bed on example-bed throws the error (see file)
bedtools v2.27.1 on Debian 10

A colleague of mine tried with his installation and got basically the same error
(without the additional hint, see below)
His versions:

bedtools v2.26.0
Python 3.7.3
Ubuntu 18.04.3 LTS

Message:
TestV15: Windows 3kb - Annotation - Decay - Length Normalised - Peak Features - reduced peak set
Preprocessing region file: Removing chr prefix, sorting regions and removing duplicats
Filter total peak set
Error: Type checker found wrong number of fields while tokenizing data line.
Runnig bedtools
Converting invalid characters
Starting TRAP
Filter regions that could not be annotated
Generating gene scores
No TF affinities provided in Test_V15_TEPIC_09_13_19_11_28_06_909012172_Affinity.txt.
Filter genes that could not be annotated
Traceback (most recent call last):
File "/Downloads/TEPIC-master/Code/filterGeneView.py", line 27, in
main()
File "
/Downloads/TEPIC-master/Code/filterGeneView.py", line 16, in main
infile=open(sys.argv[1],"r")
FileNotFoundError: [Errno 2] No such file or directory: 'Test_V15_TEPIC_09_13_19_11_28_06_909012172_Decay_Peak_Features_Affinity_Gene_View.txt'
rm: das Entfernen von 'Test_V15_TEPIC_09_13_19_11_28_06_909012172_Decay_Peak_Features_Affinity_Gene_View.txt' ist nicht möglich: Datei oder Verzeichnis nicht gefunden

Thanks for trying bedtools. I have checked the example_regions.bed and the file indeed had an extra tab space in the first row. I don't know though why that is an issue for your bedtools version. Here, (v.2.25.0) it works fine. Please pull and try again!

Works now, thanks.
Just FYI (dont know if it should be), the test runs, like the integrate-script of INVOKE, are not python3 (at least 3.7) compatible. Tests throw under python3:

TestV22: Windows 3kb - Annotation -Decay - Length Normalised - Peak Features - Compute discrete scoring using provided background regions
Preprocessing region file: Removing chr prefix, sorting regions and removing duplicats
Preprocessing background file
Runnig bedtools
Converting invalid characters
Starting TRAP
Discretising TF affinities
Filter regions that could not be annotated
Generating gene scores
Traceback (most recent call last):
File ".../TEPIC-master/Code/annotateTSS.py", line 1051, in
main()
File ".../TEPIC-master/Code/annotateTSS.py", line 984, in main
createSparseFile(affinities,tfNames,args.geneViewAffinity.replace("_Affinity_Gene_View.txt","_Decay_Sparse_Affinity_Gene_View.txt"),tss)
File ".../tepic/TEPIC-master/Code/annotateTSS.py", line 633, in createSparseFile
if (float(temp[i]) > 0):
TypeError: 'map' object is not subscriptable
Filter genes that could not be annotated

TestV26: Windows 3kb - Annotation - Decay - Length Normalised - Peak Features only - Chromatin conformation capture data
Preprocessing region file: Removing chr prefix, sorting regions and removing duplicats
Runnig bedtools
Converting invalid characters
Starting TRAP
Filter regions that could not be annotated
Generating gene scores
Traceback (most recent call last):
File ".../TEPIC-master/Code/annotateTSS.py", line 1051, in
main()
File "/.../TEPIC-master/Code/annotateTSS.py", line 960, in main
leftGenes=getGenesInLongRangeWindows(tss,regions_left_collection,float(args.lwindows)/2.0)
File ".../TEPIC-master/Code/annotateTSS.py", line 718, in getGenesInLongRangeWindows
for i in xrange(left_index, right_index + 1):
NameError: name 'xrange' is not defined

Great! I am glad it works now.

Yes, you are right. I just haven't found the time to make it python3 compatible. It is on my list :-)