totajuliusd/topr

Manhattan subplots with known and novel color coding for both plots

hlnicholls opened this issue · 8 comments

Hi, thank you for developing this package, it's by far the best I've found.

I am trying to make two manhattan plots in one. Similar to the final plot in your manhattan vignette (https://totajuliusd.github.io/topr/articles/manhattan.html):
image

However, I am trying to make these 2 plots be color-coded by known and novel loci. So I would color code by known and novel for the 2 gwas datasets in the 2 manhattan plots then add titles per each plot to identify which plot is for which trait.

I know how to do this for one manhattan plot, is there a way to do this for 2 plots in one go in the same way with the second manhattan plot upside down in comparison to the first/top one? No worries if this is maybe going out-of-scope for the package - it is already very comprehensive!

Here is exactly what I am trying:

     # Load and prepare GWAS data for phenotype1
     file_path1 <- paste0('/Input/', phenotype1, '_assoc_regenie_allchr.txt')
     gwas1 <- fread(file_path1, select = c('CHROM', 'GENPOS', 'ALLELE0', 'ALLELE1', 'p', 'A1FREQ', 'BETA', 'SE'))
     colnames(gwas1)[c(2, 3, 4, 5, 6)] <- c('POS', 'REF', 'ALT', 'P', 'AF')
     gwas_filtered1 <- gwas1[gwas1$P < 5e-8, ]
     gwas_annotated1 <- annotate_with_nearest_gene(gwas_filtered1)
     gwas1$Gene_Symbol[gwas1$P < 5e-8] <- gwas_annotated1$Gene_Symbol
     known1 <- gwas1 %>% filter(Gene_Symbol %in% known_loci)
     novel1 <- gwas1 %>% filter(!Gene_Symbol %in% known_loci)
     
     # Load and prepare GWAS data for phenotype2
     file_path2 <- paste0('/Input/', phenotype2, '_assoc_regenie_allchr.txt')
     gwas2 <- fread(file_path2, select = c('CHROM', 'GENPOS', 'ALLELE0', 'ALLELE1', 'p', 'A1FREQ', 'BETA', 'SE'))
     colnames(gwas2)[c(2, 3, 4, 5, 6)] <- c('POS', 'REF', 'ALT', 'P', 'AF')
     gwas_filtered2 <- gwas2[gwas2$P < 5e-8, ]
     gwas_annotated2 <- annotate_with_nearest_gene(gwas_filtered2)
     gwas2$Gene_Symbol[gwas2$P < 5e-8] <- gwas_annotated2$Gene_Symbol
     known2 <- gwas2 %>% filter(Gene_Symbol %in% known_loci)
     novel2 <- gwas2 %>% filter(!Gene_Symbol %in% known_loci)
     
     # Combine data for plotting
     dat <- list(c(gwas1, known1, novel1), c(gwas2, known2, novel2))
     print('plotting...')
     png_filename <- paste0("/Plots/Manhattan/known_novel_shared_manhattan_", phenotype1,"_",phenotype2, ".png")
     png(png_filename, width = 3500, height = 2500, res = 300)
     # Plotting the Manhattan plot for the current pair
     plot(manhattan(list(c(gwas1, known1, novel1), c(gwas2, known2, novel2)), color=c("darkgrey","blue","red"), annotate = c(5e-08), region_size=100000000, ntop=1, 
                    highlight_genes = genes, highlight_genes_ypos = -0.5, angle=90, ymax=40, ymin=-30, nudge_y = 2))
     dev.off()


# or something like:

  dat <- list(gwas1, known1, novel1, gwas2, known2, novel2)
  print('plotting...')
  png_filename <- paste0("/Plots/Manhattan/mtag_single_manhattan_", phenotype, ".png")
  png(png_filename, width = 3500, height = 2500, res = 300)
  # Plotting the Manhattan plot for the current pair
  
  plot(manhattan(dat, color=c("darkgrey","blue","red", "darkgrey","blue","red"), annotate = c(1e-100, 5e-08, 5e-08, 1e-100, 5e-08, 5e-08), 
            even_no_chr_lightness = c(0.8,0.5,0.5), 
            legend_labels = c(paste0('Single-trait_', phenotype),  'Known CMR Loci', 'Novel', 
                              paste0('Multi-trait_', phenotype),  'Known CMR Loci', 'Novel'),
            label_color = "black", region_size=100000000, ntop=3,
            highlight_genes = genes, highlight_genes_ypos = -0.5, angle=90, ymax=40, ymin=-30, nudge_y = 2))

Hi, is this what you had i mind:

plot3

If so, you can achieve this with topr's inbuilt datasets (CD_UKBB and CD_FINNGEN) by doing:

CD_UKBB_annotated <- CD_UKBB %>% filter(P<5e-08) %>%annotate_with_nearest_gene()
known_UKB <- CD_UKBB_annotated %>% filter(Gene_Symbol %in% c("C1orf141","IL23R","NOD2","NKD1","CYLD","IKZF1"))
novel_UKB <- CD_UKBB_annotated %>% filter(Gene_Symbol %in% c("JAK2","TTC33","ATG16L1"))

CD_FINNGEN_annotated <- CD_FINNGEN %>% filter(P<5e-08) %>% annotate_with_nearest_gene()
known_FG <- CD_FINNGEN_annotated %>% filter(Gene_Symbol %in% c("TNRC18","FBXL18","RNF216","WIPI2", "RNU6-215P", "IL23R","NOD2"))
novel_FG <- CD_FINNGEN_annotated %>% filter(Gene_Symbol %in% c("ADO","TTC33","NKX2-3"))

png_filename <- "plot_theme_grey.png"
png(png_filename, width = 3500, height = 2500, res = 300)

manhattan(list(CD_UKBB, known_UKB, novel_UKB, CD_FINNGEN, known_FG,novel_FG), ntop=3,
          color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
          legend_labels=c("Phenotype 1","known","novel","Phenotype 2", "known","novel"), 
          highlight_genes = c("IL23R","TTC33","NOD2"), 
          highlight_genes_ypos = -0.5,
          highlight_genes_color = "green", 
          annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
          angle=90,nudge_y=7, ymax=33, ymin=-43,
          theme_grey = T, title="Phenotype 1 and Phenotype 2")
dev.off()

It is very close to what you already had.

Or without the the theme_grey:

manhattan(list(CD_UKBB, known_UKB, novel_UKB, CD_FINNGEN %>% filter(P>1e-35), known_FG,novel_FG), ntop=3,
          color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
          legend_labels=c("Phenotype 1","known","novel","Phenotype 2", "known","novel"), 
          highlight_genes = c("IL23R","TTC33","NOD2"), highlight_genes_ypos = -0.5,
          highlight_genes_color = "green", 
          annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
          angle=90,nudge_y=7, ymax=33, ymin=-43,
          even_no_chr_lightness = c(0.8,0.5,0.5,0.8,0.5,0.5),
          title="Phenotype 1 and Phenotype 2") 

plot4

Note that you have to use two slightly different colors of grey for Phenotype 1 and Phenotype 2 so that they will get labelled separately below the plot (If you use the same color they will share the label, like the known and novel loci do).

Thank you for this! It's exactly what I want.

The only part I'm having trouble with still is that my legend label is only showing the novel one.

  manhattan(list(gwas1, known, novel, gwas2, known2, novel2), ntop=3,
            color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
            legend_labels=c(paste0('Single-trait_', phenotype),"Known Loci","Novel",
                            paste0('Multi-trait_', phenotype), "Known Loci","Novel"), 
            highlight_genes = genes, highlight_genes_ypos = -0.5,
            highlight_genes_color = "green", 
            annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
            angle=90,nudge_y=7, ymax=33, ymin=-43,
            even_no_chr_lightness = c(0.8,0.5,0.5,0.8,0.5,0.5),
            title=paste0('Single-trait and Multi-trait ', phenotype))

Is there any reason why the other labels might not get included in the legend?

This is what I get (example just filtered to chr2):
example_chr2

Hmmm... that is odd. When I copy your code and replace your data with test data, I get all the labels:

manhattan(list(CD_UKBB, known_UKB, novel_UKB, CD_FINNGEN, known_FG,novel_FG), ntop=3,
          color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
          legend_labels=c(paste0('Single-trait_'),"Known Loci","Novel",
                          paste0('Multi-trait_'), "Known Loci","Novel"), 
          highlight_genes = c("FTO","THADA"), highlight_genes_ypos = -0.5,
          highlight_genes_color = "green", 
          annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
          angle=90,nudge_y=7, ymax=33, ymin=-43,
          even_no_chr_lightness = c(0.8,0.5,0.5,0.8,0.5,0.5),
          title=paste0('Single-trait and Multi-trait '))

plot4

Does this also happen to you when you use the inbuilt test data (CD_UKBB and CD_FINNGEN)?

No I get your output with everything correct when I run your code with your data. I'll investigate my data further to see if I can solve it (although in terms of its format it has all the same columns with the names as your data, just a different number of rows, and I'm able to plot the data as I want using the manhattan function for different styles/formats). At any rate you have solved my main question. Hopefully I can sort the legend separately. Thank you for your help!

No problem, Im glad I could help with the main question. To test your data, I would just start with 2 datasets and basic labels and distinct colors- and then add the other datasets one by one, e.g.:

manhattan(list(gwas1, known), color=c("green","blue), legend_labels=c("L1","L2"))

Please let me know if the problem persists and you cant find anything wrong in your data.

I've found the source of my problem was filtering to chromosome 2 only (it takes me 30-45mins to generate one full plot, so I was using chr2 as it should've gave me everything as it has known and novel loci).

I ran on my full gwas data and the output has all labels. My filtering must've been off/mismatched in some way to impact the labelling I expect.

Thank you again for your help!