igvteam/extractome

Creates an "extracted" genome assembly from a fasta + list of regions

PythonMIT

extractome

Creates an "extracted" genome assembly from a fasta + list of regions in "bed" format

Usage

extractome regions.bed <options>

or

python extractome/extract.py regions.bed <options>

Required inputs

Region file in "bed" format, described here. Only the first 3 columns are required. Note that the bed format uses the “0-start, half-open” coordinate convention, so for example the first base in a sequence is represented by start=0, end=1.
Either a fasta file or IGV genome identifer (see Options below)

Options

--fasta reference fasta file, required if --genome is not specified
--genome igv.js genome id (e.g. hg38), required if --fasta is not specified
--name base name for output files, default=Xome
--output output directory name, default=output

Output

The script creates 3 output files

base_name.fa
base_name.regions.bed - the input regions file lifted over to extracted fasta
base_name.chain - a UCSC "chain" file. Can be used to liftover files to the extracted fasta with tools such as CrossMap

Example

extractome  test/data/cpgIsland_mm10.bed --genome mm10 --name CPG_mm10 --output output