motif position line not displayed correctly in TFBSPairList.pairMap()
Closed this issue · 4 comments
The motif position line seems to be a bit off sometimes. Minimal working example:
objpath="/mnt/agnerds/PaperInPrep/TF-COMB/analysis/08_grammar/distance_vanessa/a549_motif_pval_e_04/"
pickle = "a549_motif_pval_e_04.pkl"
import tfcomb
cobj = tfcomb.CombObj().from_pickle(objpath +"cobj_"+pickle)
pairs = cobj.get_pair_locations("BHLHE40","USF2")
pairs.bigwig_path = "/mnt/agnerds/PaperInPrep/TF-COMB/analysis/07_footprinting/tobias_runs/correction/A549_corrected.bw"
pairs.pairMap(logNorm_cbar=None, # One of [None, "centerLogNorm", "SymLogNorm"]. Select type of colorbar normalization.
show_binding=True, # Show the TF binding positions.
flank_plot="strand", # One of ["strand", "orientation"]. Select what is shown in the flanking plots.
figsize=(7, 14), # Figure size
output=None, # Path to output file.
flank=None) # Number of bases extended from center. Default = 100 or last used size
Hi, thanks for the nice example.
I investigated the data and I don't think the pairMap is at fault.
If you look at the motif positions produced by as_table
you can see that there are different lengths for the motif.
table = pairs.as_table()
t = table[table["site1_name"] == "USF2"]
print(set(t["site1_end"] - t["site1_start"]))
t = table[table["site2_name"] == "USF2"]
print(set(t["site2_end"] - t["site2_start"]))
t = table[table["site1_name"] == "BHLHE40"]
print(set(t["site1_end"] - t["site1_start"]))
t = table[table["site2_name"] == "BHLHE40"]
print(set(t["site2_end"] - t["site2_start"]))
# USF2 motif length
{33, 36, 37, 39, 40, 74, 50, 19, 24, 59}
{33, 37, 39, 41, 42, 43, 19, 24}
# BHLHE40 motif length
{17, 10}
{10}
The problem could occur in pairs = cobj.get_pair_locations("BHLHE40","USF2")
or earlier. Maybe there is some bug causing the motif locations to be off?
I think we had this issue before - you have to be aware that "TFBS_from_motifs" takes a parameter "resolve_overlapping", which controls what happens to overlapping motifs from the same factor:
Lines 311 to 313 in dfda78a
To maintain length of individual TFBS, please set "highest_score" (default is merge).
Now that you mention it we had this before! Here
No wonder it seemed familiar to me 😄
Solves the problem