zktuong/dandelion

Singularity pre-processing mouse

guillemsanchezsanchez1996 opened this issue · 9 comments

Hello,
I was trying to analyze some mouse gd sc RNA/TCRseq data following your nice and detailed preprocessing tutorial with singularity but the re-annotated data used human alleles as the reference. Is there a way to indicate that the organism which data I am trying to analyze is "mouse"?

Thanks in advance for your help,

Guillem

Hi, yes i'm currently trying to fix this at #237 and tracking at #236

it should be merged soon and the singularity image will be uploaded by the end of the week.

wow! That is fantastic Zewen :)

Thanks a lot and happy new year!

Guillem

hi @guillemsanchezsanchez1996

you can repull the image - let me know if there's any issues with it.

Hello again @zktuong :) At the end the "end of the week" was the "end of the day ;) Thanks a lot for your help, it work for me. Now I can reannoate the data with the mouse genes!!

However, I have noticed that, unlike with human data, the results do not include the sequences containing TRAV-DV gene segments (delta chains with a TRAV/DV gene segment + (TRDD)- TRDJ-TRDC segments). I know these V segments with "dual-usage" (in TRA or TRD chains) are in my dataset a I can find them In the cellranger all_contig output. Do you know which can be the issue here? I have also analyzed some human gamma delta data and the software was reporting without any problem sequences with these V segments such as TRAV29-DV5 or TRAV38-2DV8.

Thanks again for your help,

Guillem

Hmm i'm not sure exactly. I checked the mouse imgt database that was used and they include the TRAV/DV genes like you mentioned. As it worked for human, i'm assuming that it should have worked. I'm guessing that igblastn probably has a harder time annotating them for mouse. Perhaps you can try if TRUST4 helps in annotating them?

Thanks I will give it a try and let you know! In the meantime I came up with this from igblast: https://bitbucket.org/kleinstein/changeo/issues/185/makedb-all-tra-goes-to-fail-file
Apparently there are some issues reported with mouse TRAV germline sequences in imgt db. Is there a possibility to run the singularity pre-process with the "--partial" flag in MakeDb.py igblast.

so the singularity image already creates a db-all.tsv file which should mimic what the partial flag does. we use this file in our recent manuscript.

Hello again Zewen,
You are completely right. When I checked this tsv all the sequences were there! I guess this is something to keep in mind for other users. It appears to be really specific to the mouse case as in human this TRAV-DV are in the final dandelion tsv output

great! i suspect it's an internal step within changeo's Makedb.py that seems to fail it for mouse contigs, rather than a problem with igblastn then.