mnsmar/clipseqtools

Custom STAR parameters with --config?

Opened this issue · 9 comments

Hi,

Thank you for the nice tool. I am using it for our data but it failed at the alignment step (pre-process all). From the star log, I saw following: "EXITING because of fatal error: buffer size for SJ output is too small
Solution: increase input parameter --limitOutSJcollapsed
Jul 01 09:22:45 ...... FATAL ERROR, exiting"

Is there a way to provide different parameters to STAR? The help manual has "--config Path to command config file". Is it the place to provide that? Do you have any example file for the configure file (to modify STAR or other parameters)?

Thanks!

Hi, currently it is not possible to provide custom options for STAR. I would suggest to do the alignments outside clipseqtools and then use the preprocessing modules individually instead of bundled with the all command.

The STAR options that CLIPSeqTools uses are:

STAR
	--genomeDir [genome] \
	--readFilesIn [fastq] \
	--runThreadN [threads] \
	--outSAMattributes All \
	--outFilterMultimapScoreRange 0 \
	--alignIntronMax 50000 \
	--outFilterMatchNmin 15 \
	--outFilterMatchNminOverLread 0.9 \
	--readFilesCommand zcat \
	--outFileNamePrefix [o_prefix].star_

After you do the alignment with STAR, you can run:

clipseqtools-preprocess cleanup_alignment --sam [SAM_FILE_FROM_STAR] --o_prefix [PATH] -v

clipseqtools-preprocess sam_to_sqlite --sam_file [CLEAN_SAM] --database [NEW_DB_FILE] --drop -v

clipseqtools-preprocess annotate_with_genic_elements --database [DB_FILE] --gtf [GTF_FILE] --drop -v

clipseqtools-preprocess annotate_with_file --database [DB_FILE] --a_file [RMSK_FILE] --column rmsk --both_strands -v

clipseqtools-preprocess annotate_with_deletions --database [DB_FILE] --drop -v

clipseqtools-preprocess annotate_with_conservation --database [DB_FILE] --cons_dir [PATH_TO_CONSERVATION_FILES] --rname_sizes [FILE_WITH_CHROMOSOME_SIZES] --drop -v

Thank you for the quick response! Done with alignment. However, in the following step, I failed to run:
clipseqtools-preprocess sam_to_sqlite --sam_file [CLEAN_SAM] --database [NEW_DB_FILE] --drop -v
I assume the [CLEAN_SAM] is the final output from "cleanup_alignment" (which is ..sorted.collapsed.sam) and the [NEW_DB_FILE] is the provided file name with ".db" extension. The command also requires to provide "--table" option. I tried different table names but it did not work. Thank you!

Was there an error message?

--table is optional as the default is 'sample'. However, you can choose any name you like.

Apparently, it is a required parameter for the package i got (1.0.0). If i don't provide "--table", I got this:
clipseqtools-preprocess sam_to_sqlite --sam_file reads.adtrim.star_Aligned.out.single.sorted.collapsed.sam --database reads.adtrim.star_Aligned.out.single.sorted.collapsed.db --drop -v
Required option 'table' missing
If I provide a name without containing a ".", now it is working. I thought the table name should match the db name, which has ".".
Thanks!

I see. Well apparently clipseqtools-preprocess sam_to_sqlite is the only tool that does not use the default value "sample". I'll need to fix that. Please remember to use the --table option with whatever name you chose at this step for the remaining tools.

It is better to use "--table sample"; otherwise it will fail the next step as it looks for the default "sample" table.

In analysis modules, I found "nmer_enrichment_over_shuffled" is super slow. For typical samples, it takes 5-7 days. Is it normal? is there speed-up strategy or this step can be skipped without affecting comparative analysis, i.e., the clipseqtools-compare?

Yes unfortunately it is slow. It can be skipped without affecting comparative analysis.

Thank you for the confirmation.