stjude-biohackathon/CRCminer

Output all network nodes and edges.

Opened this issue · 0 comments

In some sort of format that makes sense. For now, going to copy the CRC program output formats so that downstream reporting/viz can be used with both. File names from the example data/run that I used are in parentheses.

  • List of all node genes (genes marked by a (super)enhancer).
  • List of all TF node genes (TF genes marked by a superenhancer).
  • Full edge table (A549_CRCs_EDGE_TABLE.txt):
SOURCE	TARGET	CHROM	START	STOP	REGION_ID	TF_INTERACTION
ARNTL	ABHD2	chr15	89119809	89119818	Peak_382	0
ARNTL	AC002066.1	chr7	116354919	116354928	10_Peak_18966_lociStitched	0
ARNTL	AC004585.1	chr17	40518775	40518784	Peak_411	0
ARNTL	AC004585.1	chr17	40543266	40543275	Peak_411	0
ARNTL	AC004585.1	chr17	40551758	40551767	Peak_411	0
ARNTL	AC004585.1	chr17	40552457	40552466	Peak_411	0

This represents ARNTL motifs found in enhancers (and/or promoters?) of the genes in the "TARGET" column. This can be used to create dynamically network plots from one or more TFs as selected by the user, optionally limiting to TF-TF interactions as designated in the last column (1 indicates that it's a TF-TF regulatory interaction).

  • (Super) enhancer BED files (A549_CRCs_ENHANCER_TABLE.txt). Currently, this format looks like:
ENHANCER_ID	CHROM	START	STOP	GENE_LIST
68_Peak_17995_lociStitched	chr5	58960811	59309427	PDE4D
31_Peak_11846_lociStitched	chr5	59459122	59621436	PDE4D
25_Peak_83063_lociStitched	chr5	172827953	172935578	ERGIC1,RPL26L1,ATP6V0E1,RF00019,AC008429.1
4_Peak_37549_lociStitched	chr15	98842601	98908710	IGF1R
Peak_411	chr17	40504195	40561897	AC004585.1,AC018629.1,TNS4

Last column is gene assignments. I kind of hate this format, switching to a BED-like format would be easier to worth with. Maybe add another column for whether it's a TF or not, which is currently subsetted into the (A549_CRCs_ENHANCER_TF_TABLE.txt) file.

  • List of self loops (A549_CRCs_SELF_LOOPS.txt), these are just TFs that have a motif in one of their own enhancers.
SOX2
JUN
EGR1
  • List of genes and their associated enhancers, and their TF designation (A549_CRCs_GENE_SUMMARY.txt). In the example data, it's only the SE-associated genes as the SEs were the only enhancers I provided:
GENE	TF	ENHANCER_LIST
AAK1	0	18_Peak_55661_lociStitched
ABCC1	0	16_Peak_3421_lociStitched
ABCC2	0	1_Peak_127_lociStitched
ABCC3	0	Peak_101,Peak_450,Peak_84

I feel it may be worth having two enhancer columns - one for super enhancers and one for "typical" enhancers.

This is still a WIP, will add more in a bit.