This repo has code and data for bioinformatics removal of host sequence from DNA or RNA sequence data derived from infected cells.
Host subtraction is a first step in the characterization of unrecognized elements among DNA or RNA sequence. Host subtraction is useful for discovery of symbionts, pathogens, and host mutations. Host Subtraction DB is a system for processing DNA or RNA sequence reads to remove host sequence. Host Subtraction DB includes tools for constructing and using a reference database specific to a host genome.
- fastq-filter-by-name.pl : Given a list of read names and a FASTQ file of reads, write another FASTQ that either includes or excludes the named reads. Run the command with no options to receive command-line usage instructions.
Jason Miller, J. Craig Venter Institute
GNU General Public License, version 3.