This software, created in the Blanck Lab at the USF Morsani College of Medicine, recovers TCR V(D)J recombination reads from sequencing data and identifies CDR3 regions within those reads. We also include code for physico-chemical analysis of the CDR3s. The latest version of the software also includes a threading
method in t4_vdjrecord.py
to stitch V- and J- segments to the CDR3s, creating full-length V-CDR3-J sequences that represent the entire TCR variable region.
- Run
sbatch gdcslice.sh
- This will download bam slices from your manifest.
- Get a manifest (for the WXS or RNA-seq files you're interested in) and token file beforehand from GDC
- There are some lines in gdcslice.sh that you must edit, check the script!
- Run
sbatch t2_Module_Search_IgTcR_header.sh
- First step in processing the bams
- Edit the file paths in this script at the top and bottom
- May need to edit the file paths in
t2_Module_Search_IgTcRFix.sh
as well - Make sure GNU parallel is installed
- Run
python t3_set_task_items.py
- Edit the path at the top of the file to match the results folder generated from the previous step
- Note the array setting printed out. You will need to put this setting into the next slurm config file,
t3_Run_VDJ.sh
(the instruction says "need to change array=0-numjobs depending on console output from t3_set_task_items.py").
- Run
sbatch t3_RunVDJ.sh
- The t3 module has multiple tasks, but most importantly, it finds the matching V/D/J/CDR3 sequence for each read
- Edit the file paths in this script
- Remember to place the vdjdb folder into the directory that you indicate
- Run
sbatch t4_pre.sh
- Edit filepaths in
t4_pre.sh
andt4_pre.py
- Edit filepaths in
- Run
sbatch t4_start_VDJRecord.sh
- Edit cancer in
t4_run_VDJrecord.py
, edit filepaths int4_start_VDJrecord.sh
- Before running t4, download
sample.tsv
from GDC, and put it in thefinal_csv
directory created by t4_pre
- Edit cancer in