basilkhuder/Seurat-to-RNA-Velocity

Index splitting and specifying index for kb count

retropc66 opened this issue · 0 comments

Hi Basil,

I'm trying to generate a loom file for RNA velocity using kb-python using the method you describe in your tutorial.

I just wrote out a detailed description of the problems I was having with kb count after generating a new reference from GENCODE vM25 - which led me to find a new troubleshooting option and the solution to my problem.

I ran the following command to build the reference; fasta.fa and genes.gtf are gunzipped copies of the GENCODE vM25 reference fa and gtf files:

kb ref -i indeces/index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 /lscratch/slurm-job-5124079/fasta.fa /lscratch/slurm-job-5124079/genes.gtf

Running this command generates a warning that the index splitting (-n) flag will be deprecated in the next major release - this led me to check the release notes for kb-python versions back to where index splitting was introduced in v0.25.0.

Use of the -n 4 flag in the kb ref command leads to the generation of four index files in the indeces directory:

  • index.idx_cdna
  • index.idx_intron.0
  • index.idx_intron.1
  • index.idx_intron.2

The v0.25.0 release notes state:

When -n is used the built indices must be passed in as a comma-delimited list to kb count

I made that change in my kb count command, which seems to have done the trick - although I don't see any loom files, so I may have to do some further tweaking.

I'd suggest a couple of updates to your tutorial (1) to remove index splitting from kb ref (or note that it will be deprecated), and/or (2) to clarify the specification of index file(s) in the kb count command. Your command has -i transcriptome.idx, which doesn't match the indexes generated by the kb ref command two lines above.

Thanks,

Chris