Output all network nodes and edges.
Opened this issue · 0 comments
In some sort of format that makes sense. For now, going to copy the CRC program output formats so that downstream reporting/viz can be used with both. File names from the example data/run that I used are in parentheses.
- List of all node genes (genes marked by a (super)enhancer).
- List of all TF node genes (TF genes marked by a superenhancer).
- Full edge table (A549_CRCs_EDGE_TABLE.txt):
SOURCE TARGET CHROM START STOP REGION_ID TF_INTERACTION
ARNTL ABHD2 chr15 89119809 89119818 Peak_382 0
ARNTL AC002066.1 chr7 116354919 116354928 10_Peak_18966_lociStitched 0
ARNTL AC004585.1 chr17 40518775 40518784 Peak_411 0
ARNTL AC004585.1 chr17 40543266 40543275 Peak_411 0
ARNTL AC004585.1 chr17 40551758 40551767 Peak_411 0
ARNTL AC004585.1 chr17 40552457 40552466 Peak_411 0
This represents ARNTL motifs found in enhancers (and/or promoters?) of the genes in the "TARGET" column. This can be used to create dynamically network plots from one or more TFs as selected by the user, optionally limiting to TF-TF interactions as designated in the last column (1 indicates that it's a TF-TF regulatory interaction).
- (Super) enhancer BED files (A549_CRCs_ENHANCER_TABLE.txt). Currently, this format looks like:
ENHANCER_ID CHROM START STOP GENE_LIST
68_Peak_17995_lociStitched chr5 58960811 59309427 PDE4D
31_Peak_11846_lociStitched chr5 59459122 59621436 PDE4D
25_Peak_83063_lociStitched chr5 172827953 172935578 ERGIC1,RPL26L1,ATP6V0E1,RF00019,AC008429.1
4_Peak_37549_lociStitched chr15 98842601 98908710 IGF1R
Peak_411 chr17 40504195 40561897 AC004585.1,AC018629.1,TNS4
Last column is gene assignments. I kind of hate this format, switching to a BED-like format would be easier to worth with. Maybe add another column for whether it's a TF or not, which is currently subsetted into the (A549_CRCs_ENHANCER_TF_TABLE.txt) file.
- List of self loops (A549_CRCs_SELF_LOOPS.txt), these are just TFs that have a motif in one of their own enhancers.
SOX2
JUN
EGR1
- List of genes and their associated enhancers, and their TF designation (A549_CRCs_GENE_SUMMARY.txt). In the example data, it's only the SE-associated genes as the SEs were the only enhancers I provided:
GENE TF ENHANCER_LIST
AAK1 0 18_Peak_55661_lociStitched
ABCC1 0 16_Peak_3421_lociStitched
ABCC2 0 1_Peak_127_lociStitched
ABCC3 0 Peak_101,Peak_450,Peak_84
I feel it may be worth having two enhancer columns - one for super enhancers and one for "typical" enhancers.
This is still a WIP, will add more in a bit.