PEM-Q is a brand new pipeline built for analysis of PEM-seq (https://www.nature.com/articles/s41421-019-0088-8/). Comparing with superQ (https://github.com/liumz93/superQ), PEM-seq is a more powerful tool for analyzing repair outcomes of genome editing, including indels, microhomologies, large deletions, vector integrations, and translocations.
Mengzhu Liu, Weiwei Zhang, Changchang Xin, Jianhang Yin, Yafang Shang, Chen Ai, Jiaxin Li, Fei-Long Meng, Jiazhi Hu, Global detection of DNA repair outcomes induced by CRISPR–Cas9, Nucleic Acids Research, Volume 49, Issue 15, 7 September 2021, Pages 8732–8742, https://doi.org/10.1093/nar/gkab686
Yin, J., Liu, M., Liu, Y. et al. Optimizing genome editing strategy by primer-extension-mediated sequencing. Cell Discov 5, 18 (2019). https://doi.org/10.1038/s41421-019-0088-8
1.Alignment
2.Random molecular barcodes extraction
3.Classify translocations
4.Classify indels
5.Events dedup
6.Statistics
Please be sure to install:
- os
- sys
- numpy
- threading
- time
- docopt
- pysam
- re
- pandas
- Bio
- FLASH v1.2.11
- bwa-0.7.12-r1034
- samtools 1.3.1
- seqtk (https://github.com/lh3/seqtk)
We have applied the TranslocPlot.R and TranslocHTMLReads.pl from transloc_pipeline (https://github.com/robinmeyers/transloc_pipeline) for fast visualization, so we suggest you download the relevant scripts from their R and lib folders. Please note that no download does not affect the running of PEM-Q and generation of the final files although errors will report.
For more flexible visualization, please convert result file into bdg format and view them in IGV (https://software.broadinstitute.org/software/igv/download).
Please add your bwa index into the environment configuration file (~/.bashrc).
export BWA_INDEX=YOUR/BWA_INDEX/PATH
For example:
export BWA_INDEX=/home/mengzhu/database/bwa_indexes
And then in $BWA_INDEX, generate a genome folder and build your bwa index. For example:
YOUR@SERVER:YOUR/BWA_INDEX/PATH$ mkdir mm10
YOUR@SERVER:YOUR/BWA_INDEX/PATH$ cd mm10
YOUR@SERVER:YOUR/BWA_INDEX/PATH$ bwa index -a bwtsw -p mm10 mm10.fa
YOUR@SERVER:YOUR/BWA_INDEX/PATH$ ls /home/mengzhu/database/bwa_indexes/mm10
mm10.amb mm10.ann mm10.bwt mm10.pac mm10.sa
git clone https://github.com/liumz93/PEM-Q
Add below variables into the environment configuration file (~/.bashrc) for PEM-Q according to your installation path.
export PATH="/your/path/PEM-Q:$PATH"
export PATH="/your/path/PEM-Q/main:$PATH"
export PATH="/your/path/PEM-Q/tools:$PATH"
PEM-Q.py genome test_sample cut-site target_chromosome primer_start primer_end primer_strand primer_sequence
For test:
PEM-Q.py mm10 CC055c 61986726 chr15 61986633 61986652 + GGAAACCAGAGGGAATCCTC
vector_analyze.py test_sample vector.fa genmome target_chromosome primer_start sgRNA_start sgRNA_end
For test:
vector_analyze.py CC055c Spcas9_pX330.fa mm10 chr15 + 7937 7956
Final results and statistic files were put into the results folder.
a statistic file include numbers of all editing events and provide editing efficiency. For example:
NoJunction 638686
Deletion 289463
Insertion 117445
Intra_Translocation.del 268
Close_inversion.del 8680
Intra_Translocation.inver 636
Inter_Translocation 11117
Translocation 12021
Editing Events 418929
Total Events 1066295
Editing Efficiency 0.39288283261198825
There are separate files for deletions, insertions, inversions and intra- or inter chromosomal translocations. Each row represent a edit event. For example:
Qname Bait_rname Bait_strand Bait_start Bait_end Prey_rname Prey_strand Prey_start Prey_end Rname Strand Junction Sequence B_Qstart B_Qend Qstart Qend Qlen Insertion Microhomolog Prey_MQ Barcode
ST-E00578:442:HF32JCCX2:7:2205:3254:67111 chr15 + 61986633 61986727 chr1 - 3432216 3432362 chr1 -3432362 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTGCCACTCCACTTACATAGTTGCTAAGTTGTTTGTTATACTGTACATATGTATGTGCCCATGAGTGCATGTGTATACATTTAAATTTCATATTGAAGCTTTAAATTTTGATTATTCATTCAAGATTTAGACTTAGTAGACATAAAGGAGCCACGCGTGCTCTACACGTTTATCAACGTCGT 5 99 100 246 278 60 CGTTTATCAACGTCGT
ST-E00578:442:HF32JCCX2:7:1102:24982:42200 chr15 + 61986633 61986726 chr1 - 3432216 3432364 chr1 -3432364 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTTCCACTCCACTTACATAGTTGCTAAGTTGTTTGTTATACTGTACATATGTATGTGCCCATGAGTGCATGTGTATACATTTAAATTTCATATTGAAGCTTTAAATTTTGATTATTCATTCAAGATTTAGACTTAGTAGACATAAAGGAGCCACGCGTGCTCTACACGTTTATCAACGTCGTG 5 98 98 246 279 T 60 CGTTTATCAACGTCGTG
ST-E00578:442:HF32JCCX2:7:2110:2727:13668 chr15 + 61986633 61986726 chr1 - 3432312 3432364 chr1 -3432364 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTTCCACTCCACTTACATAGTTGCTAAGTTGTTTGTTATACTGTACATATGTAT 5 98 98 150 150 T 60 CGTTTATCAACGTTGTG
ST-E00578:442:HF32JCCX2:7:2218:1976:4807 chr15 + 61986633 61986726 chr1 + 4434289 4434342 chr1 4434289 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTACATATGCATGGTATATATATATATGTACATCCAGGCAAACATTCATACACA 5 98 97 150 150 CT 60 CGTATAACAGCATCGAA
ST-E00578:442:HF32JCCX2:7:2201:26017:68148 chr15 + 61986633 61986717 chr1 + 6168405 6168467 chr1 6168405 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGAAGATTCTGGTCTGTGGTGTTCTTACTGGCCGGTCGTGAGAACGCGGCTAATAACAATTGG 5 89 88 150 150 AG 0 TTAATCTCACGATACGA
ST-E00578:442:HF32JCCX2:7:1106:3346:40688 chr15 + 61986633 61986726 chr1 + 6168640 6168706 chr1 6168640 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTATAGCTTTACAAGGTACGCCTGGCCTTGAACTTTCTAACGAAATTCAGGACAGTCTATCAGAAGTACCACGCGTGCTCTACACACTTTCTAGGTTCGAA 5 98 98 164 197 T 0 CACTTTCTAGGTTCGAA
ST-E00578:442:HF32JCCX2:7:1210:11221:59112 chr15 + 61986633 61986726 chr1 + 6168640 6168706 chr1 6168640 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTATAGCTTTACAAGGTACGCCTGGCCTTGAACTTTCTAACGAAATTCAGGACAGTCTATCAGAAGTACCACGCGTGCTCTACACCCTTTCTAGGTTCGAA 5 98 98 164 197 T 0 CCCTTTCTAGGTTCGAA
ST-E00578:442:HF32JCCX2:7:1104:29883:50551 chr15 + 61986633 61986726 chr1 + 6168640 6168751 chr1 6168640 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTATAGCTTTACAAGGTACGCCTGGCCTTGAACTTTCTAACGAAATTCAGGACAGTCTATCAGAAGTAAAGTGGAAAATGGCTTTACGAGGTATGCTTGGCCTTAAACTTTCTACCACGCGTGCTCTACACTCGTGTAAGATTCCCT 5 98 98 209 243 T 0 CTCGTGTAAGATTCCCT
ST-E00578:442:HF32JCCX2:7:2102:25327:37383 chr15 + 61986633 61986726 chr1 + 6168640 6168706 chr1 6168640 AGGAGGAAACCAGAGGGAATCCTCACATTCCTACTTGGGATCCGCGGGTATCCCTCGCGCCCCTGAATTGCTAGGAAGACTGCGGTGAGTCGTGATCTATAGCTTTACAAGGTACGCCTGGCCTTGAACTTTCTAACGAAATTCAGGACAGTCTATCAGAAGTACCACGCGTGCTCTACACTGTTTGTACACTAAGA 5 98 98 164 197 T 0 CTGTTTGTACACTAAGA
###Explanation of column titles
Qname: sequence name
Bait_rname: chromosome of bait
Bait_strand: strand of bait
Bait_start: chromosomal start position of bait
Bait_end: chromosomal end position of bait
Prey_rname: chromosome of prey
Prey_strand: strand of prey
Prey_start: chromosomal start position of prey
Prey_end: chromosomal end position of prey
Rname: chromosome of junction
Strand: strand of junction
Junction: chromosomal position of junction
Sequence: raw read sequence
Insertion: insertion sequence
Microhomolog: microhomology sequence used by deletions or transloctions
Barcode: random molecular barcode
For note, bdg format can be directly viewed in igv(https://igv.org/app/).
convert PEM-Q output into bdg by tab2bdg_PEMQ.py
tab2bdg_PEMQ.py CC055c_Translocation.tab mm10
convert vector output into bdg by vectorTab2bdg.py
vectorTab2bdg.py CC055c_all_vector.tab data/pX330_SpCas9.fa