TF_Network: A Python repository from ekotysh

Process explanation:

Part 1 - Get enhancer sequences and target gene names:

Read elite GeneHancer bed file, line by line
For each line, extract coordinates of the enhancer and the name of the target gene (in Ensembl)
Look up the Hugo name of the target gene and its biotype/description in genes.ENSG.tbl
Extract the enhancer sequence from referenceGenome matching the coordinates
Output this intermediary file

Part 2 - Get TF binding site motifs and apply them to enhancers

Read in Hocomoco PFM matrices one at a time
For each matrix, generate a Biopython Motif
For each enhancer region from 1) find which BS Motifs match there (on + and - strands)
Calculate how many binding sites on average match within 1 enhancer region - to be informed on the supernode

Part 3 - Use co-expression data to find TFBS clusters

Use GTEX co-expression data to figure out how TFs that match within the same enhancer region regulate the transcription of the target gene - to be informed on the supernode

Files expected to be present:

GRCh38.primary_assembly.genome.fa - reference genome ch38
genes.ENSG.tbl - gene names in Ensembl and Hugo forms, along with biotype description
elite_ensg_enhan_fused_ensembl_prom_500b.hg38.bed - elite enhancer coordinates from GeneHancer
elite_enhancer_sequences.fa - elite enhancers sequences (generated) (You can generate it by running the following command: bedtools getfasta -fi GRCh38.primary_assembly.genome.fa -bed elite_ensg_enhan_fused_ensembl_prom_500b.hg38.bed -fo elite_enhancer_sequences.fa

ekotysh/TF_Network