loosolab/TF-COMB

motif position line not displayed correctly in TFBSPairList.pairMap()

Closed this issue · 4 comments

The motif position line seems to be a bit off sometimes. Minimal working example:

objpath="/mnt/agnerds/PaperInPrep/TF-COMB/analysis/08_grammar/distance_vanessa/a549_motif_pval_e_04/"
pickle = "a549_motif_pval_e_04.pkl"
import tfcomb

cobj = tfcomb.CombObj().from_pickle(objpath +"cobj_"+pickle)
pairs = cobj.get_pair_locations("BHLHE40","USF2")
pairs.bigwig_path = "/mnt/agnerds/PaperInPrep/TF-COMB/analysis/07_footprinting/tobias_runs/correction/A549_corrected.bw"
pairs.pairMap(logNorm_cbar=None, # One of [None, "centerLogNorm", "SymLogNorm"]. Select type of colorbar normalization.
              show_binding=True, # Show the TF binding positions.
              flank_plot="strand", # One of ["strand", "orientation"]. Select what is shown in the flanking plots.
              figsize=(7, 14), # Figure size
              output=None, # Path to output file.
              flank=None) # Number of bases extended from center. Default = 100 or last used size

produces:
image

Hi, thanks for the nice example.

I investigated the data and I don't think the pairMap is at fault.
If you look at the motif positions produced by as_table you can see that there are different lengths for the motif.

table = pairs.as_table()

t = table[table["site1_name"] == "USF2"]
print(set(t["site1_end"] - t["site1_start"]))

t = table[table["site2_name"] == "USF2"]
print(set(t["site2_end"] - t["site2_start"]))

t = table[table["site1_name"] == "BHLHE40"]
print(set(t["site1_end"] - t["site1_start"]))

t = table[table["site2_name"] == "BHLHE40"]
print(set(t["site2_end"] - t["site2_start"]))
# USF2 motif length
{33, 36, 37, 39, 40, 74, 50, 19, 24, 59}
{33, 37, 39, 41, 42, 43, 19, 24}
# BHLHE40 motif length
{17, 10}
{10}

The problem could occur in pairs = cobj.get_pair_locations("BHLHE40","USF2") or earlier. Maybe there is some bug causing the motif locations to be off?

I think we had this issue before - you have to be aware that "TFBS_from_motifs" takes a parameter "resolve_overlapping", which controls what happens to overlapping motifs from the same factor:

TF-COMB/tfcomb/objects.py

Lines 311 to 313 in dfda78a

resolve_overlapping : str, optional
Control how to treat overlapping occurrences of the same TF. Must be one of "merge", "highest_score" or "off". If "highest_score", the highest scoring overlapping site is kept.
If "merge", the sites are merged, keeping the information of the first site. If "off", overlapping TFBS are kept. Default: "merge".

To maintain length of individual TFBS, please set "highest_score" (default is merge).

Now that you mention it we had this before! Here
No wonder it seemed familiar to me 😄

Solves the problem