cancerit/VAGrENT

Admin_EnsemblReferenceFileGenerator.pl not producing any result for a list of transcripts

Closed this issue · 7 comments

Hello,
I am trying to generate a vagrent.cache file for a pre-specified list of Ensembl transcripts. The list looks as a list of IDs:

ENST00000389048
ENST00000343823
ENST00000447712
ENST00000409261
...

Then I try running this script:
perl VAGrENT/bin/Admin_EnsemblReferenceFileGenerator.pl --species human --assembly GRCh38 --database homo_sapiens_core_99_38 --ccds CCDS2Sequence.current.txt --features Homo_sapiens.GRCh38.99.gtf --cdna_fa Homo_sapiens.GRCh38.cdna.all.fa --ncrna_fa Homo_sapiens.GRCh38.ncrna.fa -output vagrent_cache/ --trans_list List_of_transcripts.tsv --fai GRCh38.fa.fai
But it produces empty files, despite the fact that the transcripts in the list are grep-able in the CDNA and GTF files. Am I doing something wrong? The GTF, DNA and NCRNA come from Ensembl v.99.

Hi Andy,
Unfortunately, using Homo_sapiens.GRCh38.99.gff3 did not help, it still reports that everything worked fine:

Downloading Files -------- Skipped, files locally supplied
Obtaining Filtered Transcript List ----- Skipped, files locally supplied
Building Cache Files ----- Done

But returns a set of empty files:

-rwxrwxrwx 28 Jun 12 14:55 vagrent.human.GRCh38.homo_sapiens_core_99_38.cache.gz
-rwxrwxrwx 75 Jun 12 14:55 vagrent.human.GRCh38.homo_sapiens_core_99_38.cache.gz.tbi
-rwxrwxrwx 0 Jun 12 14:55 vagrent.human.GRCh38.homo_sapiens_core_99_38.fa

Connecting via FTP fails due to time , which is why I tried using local files.
I am running it using the cgpwgs Docker image, can it be the an issue? Does it need all the set of references specified in the cgpwgs docker wiki for the full pipeline run to create the cache for VAGrENT?
Upd. After installing all dependencies and using a local copy, I keep getting same empty cache with local Ensembl files.

Hi Andy,
This sounds like a mystery!

I get a cache file and fasta file with content. Question I should probably have asked at the start, which version of Vagrent are you using?

I tried both with versions 3.6.1 (cloned from this GitHub) and 3.5.0 (latest release). All my files come from the same links or look the same way as yours.

This update somehow disappeared from my previous comment, but I wanted to add that I am able to generate a full cache (without the list of transcripts)! Do you think it is some silent perl library misbehavior on my side?

Possibly. Can you try running it via the docker container? That should eliminate dependancy shenanigans.

https://quay.io/repository/wtsicgp/vagrent

I used the 3.6.1 container (released in May 6th) to generate my results.

Hello Andy,
I have solved the issue! Turned out to be the list of transcripts being saved with the wrong end of line symbol, which prevented the parsing script from finding the relevant transcripts in the database. Would you consider adding a message that the cache builder would output if it does not find a transcript in the reference?
Thanks for your help and sorry for the fuss!
All the best,
Nadezda

No problem, glad you solved it. I've been bitten by that in the past, but not in this context. I'll add a note to the wiki page.