kundajelab/chrombpnet

provenance of tn5 motif meme data?

mkarikom opened this issue · 7 comments

Hi, Can you provide a reference for the PWM matrices in chrombpnet/data/motifs.meme.txt?
I could not find this info in Nair 2021, but attribution-based diagnosis on the background model, including the use of deeplift/motisco/tomtom for positive id of tn5 seems like a critical step.
Thanks!

The ChromBPNet prepint is not out yet so you can cite this repo. The PWM matrices for Tn5 motifs are a combination of all the Tn5 variations in our background model.

May I ask what is this in reference to?

Hi, thanks for the quick reply!

I'm using the deeplift/tfmodisco/tomtom pipeline to check attributions on my own background model (as suggested in your very nice FAQ).
In particular, I wanted to make sure that tomtom was able to see [positive] Tn5 homology among the various modisco clusters on the background model.

But I haven't been able to find a primary reference for transposase binding motifs in any online meme database.
Initially, I mined all [~200k] keys in the meme suite for something like Tn5, but came up dry (genomics novice).
Only after that I noticed that you had already uploaded Tn5 motifs to the repo...

Since I'm scripting this for reproducibility, I need to know precisely how to get these.
If you did not retrieve them from a primary source, I would want to re-generate them myself...

yeah online databases dont have a motif representation for Tn5.

We look at background models in different celltypes and pick the representative motifs of Tn5 variants while making sure there are no TF motifs. You can differentiate it based on eye (look for the palindromic nature of Tn5), here we included in our report to assist the user in annotation.

Ahh, so in other words, diagnosis of any new background model is based on empirical summarization of many previously trained background models?

yeah it is empirical

These are the conventional tn5 logos (https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-019-1642-2/MediaObjects/13059_2019_1642_MOESM1_ESM.pdf , page 5) you will notice variants of this in your background model