Which file to use for deseq to analyse tRF differential expression?
Closed this issue · 5 comments
Hello, sorry for another question but was wondering which file I should use to analyze tRF expression.
I have the tRF.Counts file but it looks like this below:
Entry name N1
Asn_Comb_9 1
Glu_Comb_27 1
Glu_Comb_35 1
And the rest are all 0. The tRF.RP100K looks the same as above but instead of 1 it is 66.36 then the rest are all 0 as well.
Also the tRF.Counts.csv file for my other sample, the tRF counts are all 0. So I was wondering which file to use to analyse differential expression.
There is also another folder called tRFs.samples.tmp and in the folder the file called N1_aligned_tRFs.summary below showed:
amino acid Counts RP100K Unique reads
pre:Tyr tRF-1 1175 77969.476 10
pre:Met tRF-1 53 3516.921 7
pre:Ser tRF-1 10 663.570 6
pre:Cys tRF-1 23 1526.211 7
Glu 5' 0 0.000 0
Glu 3' 1 66.357 1
Glu other 3 199.071 3
Asn 5' 1 66.357 1
Asn 3' 0 0.000 0
Asn other 4 265.428 4
pre:Lys tRF-1 3 199.071 3
pre:His tRF-1 0 0.000 0
Ala 5' 0 0.000 0
Ala 3' 0 0.000 0
Ala other 3 199.071 2
pre:Val tRF-1 5 331.785 4
pre:Thr tRF-1 2 132.714 2
pre:Pro tRF-1 5 331.785 4
pre:Phe tRF-1 1 66.357 1
pre:Leu tRF-1 3 199.071 3
pre:Ile tRF-1 5 331.785 5
pre:Gly tRF-1 5 331.785 5
pre:Arg tRF-1 3 199.071 3
pre:Glu tRF-1 4 265.428 2
pre:iMet tRF-1 4 265.428 3
pre:Gln tRF-1 3 199.071 3
pre:Asp tRF-1 1 66.357 1
pre:Asn tRF-1 1 66.357 1
pre:Ala tRF-1 4 265.428 4
Tyr 5' 1 66.357 1
Tyr 3' 0 0.000 0
Tyr other 0 0.000 0
pre:SeC tRF-1 0 0.000 0
Thank you again for the help
Appreciate it
If you are going to use it for Deseq analysis, you'd want to use the raw count file (tRF.Counts file) and not the RP100K file, which is normalized to total tRFs. With that said, you seem to have very few tRF reads (1,507 by my count, from what you report) so I would be very cautious interpreting this data.
Hello, thank you so much for the reply, really appreciate it. Just a bit confused regarding the files.
So from the tRF counts file, I only got 3 counts for tRF and was a bit confused why the count file shows only 3 counts for tRF and when I looked at the aligned tRF.aligned.report.tsv there were 1507 counts.
Also the glu_comb, the comb means combined?
Thank you so much for the help
Looks like there probably is an issue with counts here. Can you share the files across so I can have a quick look? A sample of FASTQ file if that is possible and adapter sequence used here. (If you would like to send it to my email address, that would be fine as well: arun26feb at gmail dot com)
Thank you,
Arun
Hi, I checked the files you shared across, here are my observations:
- they are paired end reads which is not designed for small RNA sequencing analysis using miRge3.0,
- In rare cases I have come across that they do use paired-end to analyze smallRNA, however, the data contains lot of poly-A and the adapter sequence you shared looks like it is 3' adapter and further I checked the counts of the adapter sequence and the reads from the file:
$ zgrep -c "CTGTCTCTTATACACATCT" N1_R1.fastq.gz
72981
$ zgrep -c "CTGTCTCTTATACACATCT" N1_R2.fastq.gz
61661
$ zgrep -c "@NS500799" N1_R1.fastq.gz
71280346
$zgrep -c "@NS500799" N1_R2.fastq.gz
71280346
From the above we can see that majority of the reads don't have the adapter sequence.
I will get back if I find more details on this data.
Thank you,
Arun.