Interpretation of Δexon–Δintron vs Δintron scatterplots before and after removing the bias term

Question

Interpretation of Δexon–Δintron vs Δintron scatterplots before and after removing the bias term

Opened this issue 8 months ago · 2 comments

Dear REMBRANTS team,

I have applied your pipeline to a dataset of different cell lines within the same cell type. Differential expression analysis revealed a large number of downregulated/upregulated genes between them, and I would also expect some deviations from stability.

When running the pipeline, scatter plots of Δexon–Δintron vs Δintron are produced (see one example attached). I uncommented from your code, the plotting of loess fitting regression line and seems to fit a constant line in Δexon–Δintron=0 (red), either before or after correction. I do not see any trends like those the paper (Fig 1.c and 1.d).

Could you please share your interpretation of these plots with me?

The data was generated using a total RNAseq protocol, has good coverage and the Δexon vs Δexon displays good correlation.

Below the relevant text printed by the pipeline:
[1] "Optimizing read count cutoff at stringency 0.99 ..."
[1] "Total correlation is 1"
[1] "Total number of genes is 15181"
[1] "Maximum correlation is 1"
[1] "Selected threshold is 5.87159523748979"
[1] "Number of remaining genes is 12773"
.

Many thanks,
Ivan

Answer 1 · 2024-01-24T01:21:53.000Z

Hi Ivan, It seems to me that the correlation between intronic and exonic reads is 1, which is very unusual (intronic and exonic read counts seem to be identical). Can you verify that the read counts are obtained correctly? You can also try the approach described here for obtaining exonic/intronic read counts: https://github.com/csglab/CRIES. Best, Hamed

…

On Tue, Jan 23, 2024 at 7:26 PM rosshandler ***@***.***> wrote: Dear REMBRANTS team, I have applied your pipeline to a dataset of different cell lines within the same cell type. Differential expression analysis revealed a large number of downregulated/upregulated genes between them, so I would also expect some deviations from stability. When running the pipeline, scatter plots of Δexon–Δintron vs Δintron are produced (see one example attached). I uncommented from your code, the plotting of loess fitting regression line and seems to fit a constant line in Δexon–Δintron=0 (red), either before or after correction. I do not see any trends like those the paper (Fig 1.c and 1.d). Could you please share your interpretation of these plots with me? The data was generated using a total RNAseq protocol, has good coverage and the Δexon vs Δexon displays good correlation. Below the relevant text printed by the pipeline: [1] "Optimizing read count cutoff at stringency 0.99 ..." [1] "Total correlation is 1" [1] "Total number of genes is 15181" [1] "Maximum correlation is 1" [1] "Selected threshold is 5.87159523748979" [1] "Number of remaining genes is 12773" . scatterplot.CellLine1_rep1.exon.jpg (view on web) <https://github.com/csglab/REMBRANDTS/assets/17701395/b2ff48fc-6434-49f3-9f6f-56c69823365d> scatterplot.jpg (view on web) <https://github.com/csglab/REMBRANDTS/assets/17701395/fc80e03e-c332-4721-acca-a963b6fee560> Many thanks, Ivan — Reply to this email directly, view it on GitHub <#13>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZXQNDSPMXB2AQP77YCE7TYQBIKBAVCNFSM6AAAAABCH4E3RCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TOMJZG42TCMI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2024-01-25T19:05:32.000Z

Hi Hamed,

Thanks for the quick reply. I checked the files as suggested. Indeed they were basically the same, so I applied CRIES and ran REMBRANDTS again. This time the results are of course more informative. Please find the same plots/info below:

1] "Optimizing read count cutoff at stringency 0.99 ..."
[1] "Total correlation is 0.592024058847667"
[1] "Total number of genes is 13915"
[1] "Maximum correlation is 0.704987667859836"
[1] "Selected threshold is 8.2731303169406"
[1] "Number of remaining genes is 4779"

Just one quick question, could you please provide me with a quick interpretation of the corrected plot, why does the slope becomes positive?

Appreciate your help and looking forward for the downstream analysis.

Best,
Ivan