fiberseq/fibertools-rs

ft center m6A methylation on non A sites

Closed this issue · 2 comments

Dear Mitchell,

After running the fiberseq-smk pipeline, I use ft center on my BAM to generate a metaprofile of m6A methylation rates around CTCF sites. However, I have noticed that there are many m6A positions that do not match with any 'A' in the query sequence.

  • the command line to obtain my results ft center test_ccs740.bam CTCF_test.bed -w > test_ccs740_center.txt
  • Here a minimal example with one read : ccs740_test.zip
  • Version of fibertools-rs 0.2.5
  • OS: Linux

Thanks for your opinion,
Rachel

Hi @rlegendre,

m6A calls happen on both A and T bp since CCS sequencing happens on both strands of DNA for even a single molecule, and I was able to confirm that every bp with an m6A call is an A or T. Perhaps you were thinking that T shouldn't be possible?

Here is my test code:

import pandas as pd

df = pd.read_csv("~/Downloads/ccs740_test/test_ccs740_center.txt", sep="\t")

positions = [int(x) for x in df["centered_m6a_positions"][0].split(",") if x != ""]
seq = df["query_sequence"][0]
centered_query_start = df["centered_query_start"][0]

for pos in positions:
    print_pos = pos - centered_query_start
    bp = seq[print_pos : print_pos + 1]
    if bp == "A":
        # this is fine
        continue
    elif bp == "T":
        # CSS reads both strands of DNA so this is fine.
        continue
    else:
        raise ValueError("This is not an A or a T at an m6A position!")

Indeed, I had never considered that T sites could also be methylated. With this new information, my metaprofile should be fine! Thank you for your assistance, and I apologize for any inconvenience.