ngs2dia

Daily NGS tools for bioinformaticians

uploadGEO.sh

User friendly GEO upload. See details below:

> ./uploadGEO.sh -h

This script uploads your folder to GEO FTP server. If the connection times out, it tries to reconnect.
Be careful with your input parameters (especially with -r) to avoid overwhelming the GEO FTP server.

Usage: uploadGEO.sh [FLAGS]
	 [-l <local_directory>]                  Local directory to be uploaded [default=.]
	 [-r <remote_directory>]                 Remote directory on GEO FTP server
	 [-u <username>]                         GEO username
	 [-p <password>]                         GEO password
	 [-h <help>]                             Print help and exit

For GEO username and password, please refer to https://www.ncbi.nlm.nih.gov/geo/info/submissionftp.html

Example usage:
./uploadGEO.sh -l /my/folder/to/upload/geo -r ./my/geo/directory -u geoftp -p PaSsWoRd

md5parallel.sh

Runs md5sum parallel. -n can be set to 4 times of the cores available (e.g. -n 32 if you have 8 cores on your machine).

./md5parallel.sh -h

Usage: md5parallel.sh [FLAGS]
	 [-w <working_directory>]                Working directory where the MD5SUMS will be checked [default=.]
	 [-n <n_jobs>]                           Number of parallel jobs [default=1]
	 [-o <output_file>]                      Output filename [default=./md5.txt]
	 [-g <gzip_file>]                        Flag to gzip the output file
	 [-r <remove_working_directory_prefix]   Flag for removing the working directory prefix in MD5SUM output file

fastq_merge.sh

Merges lane-split FASTQ files. See below:

Usage: fastq_merge.sh [FLAGS]

[-i <input_dir>]
	 Input directory where FASTQ files to be merged are located
	   default: .
[-a <sequencing_type>]
	 Sequencing type: Paired-end (PE) or Single-end (SE)
	   default: PE
	   options: PE,SE
[-b <file_basename_pattern>]
	 Grep pattern for FASTQ files
	   default: fastq.gz
[-f <file_R1_pattern>]
	 Grep pattern for FASTQ files with R1_reads
	   default: R1.fastq.gz
[-r <file_R2_pattern>]
	 Grep pattern for FASTQ files with R2_reads. Does not influence anything if sequencing type (-a) is SE
	   default: R2.fastq.gz
[-s <file_basename_seperator>]
	 Delimitter to seperate FASTQ file basename
	   default: _ (underscore)
[-c <file_cut_uniq>]
	 Cut basename of FASTQ file using -s flag. Will be passed to 'cut -f'
	   default: 1-3
[-t <threads>]
	 Number of threads to be used for parallel processing
	   default: 1
[-o <output_dir>]
	 Input directory for the merged FASTQ files
	   default: .
[-h help]
	 Display help menu


Example:
- Files to merge:
 001_100_S1_L001_R1.fq.gz
 001_100_S1_L002_R1.fq.gz
 001_100_S1_L003_R1.fq.gz
 001_100_S1_L004_R1.fq.gz
- Output file:
 001_100_S1.fq.gz

- Code to run:
fastq_merge.sh -b fq.gz -s _ -c 1-3 -f R1.fq.gz

fastq_readlength.sh

Extract read length of a gzipped FASTQ file

Usage: 
./fastq_readlength.sh myfile.fastq.gz

cov2bedgraph.sh

Converts DNA methylation coverage files (.cov) generated by Bismark to BEDGRAPH files.

Usage: cov2bedgraph.sh [FLAGS]

Examples:
cov2bedGraph -i myfile.cov -o myfile.bedgraph
cov2bedGraph -i myfile.cov -g -o myfile.bedgraph.gz

Flags:
	 [-i <input_file>]                       Input cov file. Can be gzipped (.cov.gz). [1-based coord]
	 [-o <output_file>]                      Output bedGrapgh file
	 [-z <zero_based>]                       Flag if input cov file is in 0-based genomic coordinate. Normally cov files are 1-based, but it can be 0-based in special occasions.
	 [-g <gzip_file>]                        Flag to gzip the output file

illumina_indexExtractor.sh

Extracts the index sequence counts from a gzipped FASTQ file

Usage: ./illumina_indexExtractor.sh myfile.fastq.gz

find_basespace_temp.sh

Find the 'fastq.gz.temp' files.

Usage: ./find_basespace_temp.sh

removeFilenameSpace.py

Removes spaces in the files

Usage:

./removeFilenameSpace.py 'my file.txt'
# Output: myfile.txt

gtf2ensembl.sh

Create ENSEMBL IDs from GTF file. Output columns: gene, transcript, exon, gene name

Usage:
	 gtf2ensembl.sh [FLAGS]

Description:
	 Create ENSEMBL IDs from GTF file. Output columns: gene, transcript, exon, gene name

Examples:
	 gtf2ensembl.sh -i myfile.gtf -o myfile.txt

Flags:
	 [-i <input_file>]                       Input gtf file. Can be gzipped (.gtf.gz)
	 [-o <output_file>]                      Output file, tab-delimitted
	 [-g <gzip_file>]                        Flag to gzip the output file
	 [-h <help>]                             Print help

Dependencies:
	 bedops --> convert2bed

altintasali/ngs2dia