omarwagih/ggseqlogo

Using different TSD lengths as facetting variable

Closed this issue · 1 comments

Hi,

I have a dataset with TSDs of different lengths and wanted to create sequence logo for each length bin. I think it would be very handy to use the ggplot's facet functionality to do this. However, with the current implementation this raises an error for different sequence length.
Error in letterMatrix(seqs) : Sequences in alignment must have identical lengths

I am not sure how difficult it is to implement this behaviour but it would help tremendously to explore heterogeneous datasets, as from TE calling.

Best

Fritjof

This is my data:

head(df)
TSD<chr> TSD_length <int>
1	TAAAAATAAAGTCCT	15		
2	AAAAGATTTGTGCAG	15		
3	TGGGGGGACATTTTT	15		
4	CCATTCTGATTTTTTT	16		
5	ACAGGGAAAGGTTTTT	16		
6	AAAAAGTGTGCTGGAGG	17

And my ggplot call:

p <- ggplot(df.pass.tsdlength.test)
p + geom_logo(data = df.pass.tsdlength.test$TSD, seq_type = "dna" ) +
  theme_logo() + 
  facet_wrap( ~ TSD_length)

I think the underlying issue is that ggseqlogo doesn't work as you might expect it to.

In fact, it does not even require passing data to ggplot.

Try this as a workaround:

ggseqlogo(with(df,split(TSD,sprintf('TSD_length=%s',TSD_length))))