ExtractFeatures and Output outputting very few features
ekinda opened this issue · 2 comments
Hello,
I want to extract multiple alignments of features from a BedGraph file. Using a maf file that only has blocks from human chr1, and the BedGraph file has 290 features in chr1 (most of them are found within alignment blocks, confirmed by another method).
But Maffilter only returns and outputs the first 45 features, then abruptly stops. Here is the options file:
input.file=chr1.maf
input.file.compression=none
input.format=Maf
input.dots=as_unresolved
input.check_sequence_size=0
output.log=maffilter.log
maf.filter= \
ExtractFeature( \
ref_species=homo_sapiens, \
feature.file=test.BedGraph, \
feature.file.compression=none, \
feature.format=BedGraph, \
feature.type=all, \
compression=none, \
complete=no, \
ignore_strand=no), \
Output( \
file=maffilter_test.maf, \
compression=none, \
mask=no), \
I'm using the debian package maffilter.
maffilter/unstable,now 1.3.1+dfsg-4 amd64 [installed]
After 44 similar extracting annotations messages, the last console messages are
Extracting annotations.................:
Done.
45 blocks kept, totalizing 1176bp.
MafFilter's done. Bye.
Is there a hidden memory or block number limit that is causing this? (The chr1.maf file is 8 GB) (Also, when testing only using Output as a filter, maffilter stops after 200 blocks, so that seems to be a limit?)
Is it possible to extract all features somehow?
Thanks for the work, it seems to run pretty fast and I will use this if I can solve this issue.
Best
Ekin
Dear Ekin,
There is no block limit in maffilter and no memory issue since the program completes without error.
The run with the Output only tells you that there are 200 blocks in your original MAF file. Let's see why it only output 45 when the feature filter is on...
First, are you sure these are really the first 45? Or are they spread among the 290 original features?
One important thing is that your MAF file should be projected on the reference genome (the one for which the BedGraph file is made); is that the case? (if not, can you try using maf_project, from the TBA package, on homo_sapiens before using maffilter)?
All the best,
Julien.
Dear Julien,
Thanks for your quick response. I figured it out, the MAF file had another header line in the middle of the file, and the program thought it was end of the file. Removing the extra header and then also removing the double newline by cat -s
solved both issues simultaneously. Now it is working as intended.
All the best,
Ekin