Fasten

Perform random operations on fastq files, using unix streaming. Secure your analysis with Fasten!

Synopsis

read metrics

$ cat testdata/R1.fastq testdata/R2.fastq | \
    fasten_shuffle | fasten_metrics | column -t
totalLength  numReads  avgReadLength  avgQual
800          8         100            19.53875

read cleaning

$ cat testdata/R1.fastq testdata/R2.fastq | \
    fasten_shuffle | \
    fasten_clean --paired-end --min-length 2 | \
    gzip -c > cleaned.shuffled.fastq.gz

$ zcat cleaned.shuffled.fastq.gz | fasten_metrics | column -t
totalLength  numReads  avgReadLength  avgQual
800          8         100            19.53875
# No reads were actually filtered with cleaning, with --min-length=2

etc

Installation

Fasten is programmed in the Rust programming language. More information about Rust, including installation and the executable cargo, can be found at rust-lang.org.

After downloading, use the Rust executable cargo like so:

cd fasten
cargo build --release
export PATH=$PATH:$(pwd)/target/release

All executables will be in the directory fasten/target/release.

General usage

All scripts accept the parameters, read uncompressed fastq format from stdin, and print uncompressed fastq format to stdout. All paired end fastq files must be in interleaved format, and they are written in interleaved format, except when deshuffling with fasten_shuffle.

--help
--numcpus Not all scripts will take advantage of numcpus. (not currently implemented)
--paired-end Input reads are interleaved paired end
--verbose Print more status messages

Documentation

Please see the inline documentation at https://lskatz.github.io/fasten/fasten

This documentation was built with cargo docs --no-deps

Contributing

Instructions for how to contribute can be found in CONTRIBUTING.md.

Fasten script descriptions

script	Description
`fasten_clean`	Trims and cleans a fastq file.
`fasten_convert`	Converts between different sequence formats like fastq, sam, fasta.
`fasten_straighten`	Convert any fastq file to a standard four-line-per-entry format.
`fasten_metrics`	Prints basic read metrics.
`fasten_pe`	Determines paired-endedness based on read IDs.
`fasten_randomize`	Randomizes reads from input
`fasten_combine`	Combines identical reads and updates quality scores.
`fasten_kmer`	Kmer counting.
`fasten_normalize`	Normalize read depth by using kmer counting.
`fasten_sample`	Downsamples reads.
`fasten_shuffle`	Shuffles or deshuffles paired end reads.
`fasten_validate`	Validates your reads (deprecated in favor of `fasten_inspect` and `fasten_repair`
`fasten_inspect`	adds information to read IDs such as seqlength
`fasten_repair`	Repairs corrupted reads
`fasten_quality_filter`	Transforms nucleotides to "N" if the quality is low
`fasten_trim`	Blunt-end trims reads
`fasten_replace`	Find and replace using regex
`fasten_mutate`	introduce random mutations
`fasten_regex`	Filter for reads using regex
`fasten_progress`	Add progress to any place in the pipeline
`fasten_sort`	Sort fastq entries

Etymology

Many of these scripts have inspiration from the fastx toolkit, and I wanted to make a fasty which was already the name of a bioinformatics program. Therefore I cycled through other letters of the alphabet and came across "N." So it is possible to pronounce this project like "Fast-N" or in a way that indicates that you are securing your analysis by "fasten"ing it (with a silent T).

Acknowledgements

Thank you Henk Den Bakker for many helpful discussions around Rust, helping me name this software, and many other things.

bovee/fasten