omarwagih/ggseqlogo

different color schemes for different sites?

jbloom opened this issue · 2 comments

How hard would it be to enable different letter color schemes at different sites? I know this might seem odd given some of the traditional uses of logo plots, but we are using them to illustrate things such as the frequencies of mutations in an alignment where we'd like to color amino acids by some site-specific functional information.

Apologies for taking a while to response to this. I'm not sure if this is still something you'd like to do but it is possible. The easiest way is to plot the sequence logo as usual, then extract the data ggseqlogo used to create the plot, repurpose it and replot. Here's a working example

require(ggseqlogo)
data(ggseqlogo_sample)

# Sample plot
x = ggseqlogo(seqs_dna$MA0001.1)

# Extract data from gg object
tmp = x$layers[[1]]$data

# Set variable you want to color by, in this case I'm coloring by position by you could could this by position/letter if needed.
tmp$col_by = factor(tmp$position)

# Generate random rainbow colors 
rnd_color = rainbow(10)
names(rnd_color) = 1:10

# Plot with new data
p = ggplot() + layer(
  stat = 'identity', data = tmp, 
  mapping = aes_string(x = 'x', y = 'y', fill='col_by', group='group_by'), 
  geom = 'polygon',
  position = 'identity', show.legend = NA, inherit.aes = F,
  params = list(na.rm = T)
) + scale_fill_manual(values=rnd_color)

print(p)

by_position_ggplot

Hope that helps!

Hello
I am interested in coloring by position letter and I am struggling with it.
after extracting the data from my gg object

# Extract data from gg object
tmp = x$layers[[1]]$data

I tried to merge it with my color codebook which looks like that:

  posindex Region  BNAB Susceptibility AminoAcid
1        1  CD4bs VRC01     Resistance         M
2        1  CD4bs VRC01    Sensitivity         I
3        2   <NA>  <NA>        Neutral         Z
4        3  CD4bs VRC01    Sensitivity         A
5        4  CD4bs VRC01    Sensitivity         E
6        5  CD4bs VRC01     Resistance         D

so I merge by position and aminoacid:

mergedtable <- merge(x = tmp, y = colorcode,by.x = c('letter','position'),by.y = c('AminoAcid','posindex'), all=TRUE)

but I think it messed up the df for some reason...
if I only plot tmp, it looks fine (first plot) but if I try to plot the mergedtable, it looks strange even if I am not trying to do any coloring:

testplot = ggplot() + layer(
  stat = 'identity', data = tmp, 
  mapping = aes_string(x = 'x', y = 'y', group='group_by'),
  geom = 'polygon',
  position = 'identity', show.legend = NA, inherit.aes = F,
  params = list(na.rm = T))

or

testplot = ggplot() + layer(
  stat = 'identity', data = mergedtable, 
  mapping = aes_string(x = 'x', y = 'y', group='group_by'),
  geom = 'polygon',
  position = 'identity', show.legend = NA, inherit.aes = F,
  params = list(na.rm = T))

what am I missing??
THANK YOU
Rplot tmp
Rplot merged