mnsmar/clipseqtools

CLIPseqtools-compare issues and analyzing replicates

bsaleme opened this issue · 4 comments

Hello, I was wondering if there is a way in ClipseqTools to analyze biological/technical replicates simultaneously? Or do we have to merge all fastq files prior to analysis?

More importantly, I have been having issues with clipseqtools-compare. It crashes minutes after I start the run and I couldn't get it to work at all. You will see below the errors that I get:

Starting analysis: libraries_overlap_stats
Validating arguments
Reading sizes for reference alignment sequences
Creating reads collection
Creating reference reads collection
Measuring the overlap of the primary library with the reference
Annotating 1 with reference records
slice: slice starts out of bounds in pos 0 (start is 15014; source dim 0 runs 0 to -1) at /usr/local/lib64/perl5/PDL/Core.pm line 814.
PDL::ANON(PDL=SCALAR(0x55873fccc378), 1, undef) called at /usr/local/share/perl5/CLIPSeqTools/CompareApp/libraries_overlap_stats.pm line 151
CLIPSeqTools::CompareApp::libraries_overlap_stats::ANON(GenOO::Data::DB::DBIC::Species::Schema::Result::sample=HASH(0x55873fed76b0)) called at /usr/local/share/perl5/GenOO/RegionCollection/Type/DBIC.pm line 182
GenOO::RegionCollection::Type::DBIC::foreach_record_on_rname_do(GenOO::RegionCollection::Type::DBIC=HASH(0x55873f856cc8), 1, CODE(0x55873f607130)) called at /usr/local/share/perl5/CLIPSeqTools/CompareApp/libraries_overlap_stats.pm line 155
CLIPSeqTools::CompareApp::libraries_overlap_stats::run(CLIPSeqTools::CompareApp::libraries_overlap_stats=HASH(0x55873dd19d00)) called at /usr/local/share/perl5/CLIPSeqTools/CompareApp/all.pm line 164
CLIPSeqTools::CompareApp::all::run(CLIPSeqTools::CompareApp::all=HASH(0x55873de22e58)) called at /usr/local/bin/clipseqtools-compare line 19

Any advice? Thanks!

Hi, CLIPSeqTools does not treat biological replicates in a special way. I usually prefer to run all replicates independently and then compare the final results for reproducibility. Merging the fastq is an option, but then you lose information about reproducibility. For comparing gene, transcript, intron, exon counts across libraries (differential expression) you can use clipseqtools-compare join_tables on the output of clipseqtools count_reads_on_genic_elements to prepare the files for DESeq2 which takes into account replicates.

Regarding the error: I'm not sure what is going on. Do you use custom annotation files? I see that the script complains for the size of chromosome "1" which might indicate that maybe this is not contained in the rname_sizes files that you give?

Thank you very much for your response!

As for the error, I was using the annotation files linked on the clipseqtools website (http://mourelatos.med.upenn.edu/clipseqtools/data/), the community downloads hg38. Do you think I should try the hg19 instead?

The chromosome name seems to be "1" without the prefix "chr" while the rname_sizes file has "chr1". I do not what is the reason for this discrepancy. Did you re-create the STAR index using a different genome file (e.g. from ENSEMBL). The downloaded hg38 files use the "chr" prefix in all files.

Closed due to inactivity.