FoxoTech/methylprep

Accessing probe intensities for QC metrics

Closed this issue · 0 comments

Greetings,

I am playing around with this repo in an attempt to implement GCT score calculation (Explained here: https://academic.oup.com/nar/article/45/4/e22/2290930).

I've added some code to manifest.py that allows me to add a column to the manifest dataframe that contains the extension base for a probe.

Next is to get the mean green intensity of probes expected to extend into a C and divide that by the mean green intensity of probes expected to extend into a T. I've included my function for doing so below but I am not getting close to producing a ratio near 1 (the expected result).

I was hoping to get some advice on where I might be going wrong here. I've had some trouble tracking probe intensity information throughout the analysis so it may be possible that I am using the wrong dataframe for this analysis. I am afraid my current approach uses unnormalized / dye bias corrected intensities which would prevent me from calculating a meaningful ratio.

I appreciate any advice you could share with me to help me implement GCT score calculation. From my testing, it's a useful way to quantify levels of bisulfite conversion and I'd be happy to add the feature and open a pull request.

Thanks,
Mauro Chavez


def _calculate_gct_score(data_container):
    """
    SeSSme implementation in R:
        # Get the type I probes expected to extend into C and T
        extC <- sesameDataGet(paste0(sdfPlatform(sdf), '.probeInfo'))$typeI.extC
        extT <- sesameDataGet(paste0(sdfPlatform(sdf), '.probeInfo'))$typeI.extT
        ## prbs <- rownames(oobG(sset))
        df <- InfIR(sdf)
        extC <- intersect(df$Probe_ID, extC)
        extT <- intersect(df$Probe_ID, extT)
        dC <- df[match(extC, df$Probe_ID),]
        dT <- df[match(extT, df$Probe_ID),]
        mean(c(dC$MG, dC$UG), na.rm=TRUE) / mean(c(dT$MG, dT$UG), na.rm=TRUE)
    """
    if "Next_Base" not in data_container.man.columns:
        LOGGER.warning(f"No Next_Base col found in manifest: NO GCT Score calculated")
        return None
    print(data_container.man.columns)
    # Successful conversion means that all these probes should light up RED
    ext_c_probes = data_container.man[(data_container.man["Next_Base"] == "C") & (data_container.man["Infinium_Design_Type"] == "I")]
    ext_c_probes = ext_c_probes.merge(
        data_container.green_idat.probe_means,
        left_on="AddressA_ID", right_index=True,
    ).merge(
        data_container.green_idat.probe_means,
        left_on="AddressB_ID", right_index=True,
        suffixes=('_AddressA_grn', '_AddressB_grn')
    ).merge(
        data_container.red_idat.probe_means,
        left_on="AddressA_ID", right_index=True,
    ).merge(
        data_container.red_idat.probe_means,
        left_on="AddressB_ID", right_index=True,
        suffixes=('_AddressA_red', '_AddressB_red')
    )
    print("Exp C probes")
    print(ext_c_probes[["mean_value_AddressA_grn", "mean_value_AddressB_grn", "mean_value_AddressA_red", "mean_value_AddressB_red"]].mean())

    # Use these probes to normalize against how often a green signal comes from an extension into a T
    ext_t_probes = data_container.man[(data_container.man["Next_Base"] == "T") & (data_container.man["Infinium_Design_Type"] == "I")]
    ext_t_probes = ext_t_probes.merge(
        data_container.green_idat.probe_means,
        left_on="AddressA_ID", right_index=True,
    ).merge(
        data_container.green_idat.probe_means,
        left_on="AddressB_ID", right_index=True,
        suffixes=('_AddressA_grn', '_AddressB_grn')
    ).merge(
        data_container.red_idat.probe_means,
        left_on="AddressA_ID", right_index=True,
    ).merge(
        data_container.red_idat.probe_means,
        left_on="AddressB_ID", right_index=True,
        suffixes=('_AddressA_red', '_AddressB_red')
    )
    print("Exp T probes")
    print(ext_t_probes[["mean_value_AddressA_grn", "mean_value_AddressB_grn", "mean_value_AddressA_red", "mean_value_AddressB_red"]].mean())

    return (ext_c_probes["mean_value_AddressA_grn"].mean() + ext_c_probes["mean_value_AddressB_grn"].mean()) / (ext_t_probes["mean_value_AddressA_grn"].mean() + ext_t_probes["mean_value_AddressB_grn"].mean())