A set of tools to analyze genomic data with a focus on Next Generation Sequencing. This readme document is mostly for developers/contributors and those attempting to build the project from source. Detailed user documentation is available on the project website including tool usage and documentation of metrics produced. Detailed developer documentation can be found here.
- Goals
- Overview
- List of tools
- Building
- Command line
- Include fgbio in your project
- Contributing
- Authors
- License
There are many toolkits available for analyzing genomic data; fgbio does not aim to be all things to all people but is specifically focused on providing:
- Robust, well-tested tools.
- An easy to use command-line.
- Clear and thorough documentation for each tool.
- Open source development for the benefit of the community and our clients.
Fgbio is a set of command line tools to perform bioinformatic/genomic data analysis.
The collection of tools within fgbio
are used by our customers and others both for ad-hoc data analysis and within production pipelines.
These tools typically operate on read-level data (ex. FASTQ, SAM, or BAM) or variant-level data (ex. VCF or BCF).
They range from simple tools to filter reads in a BAM file, to tools to compute consensus reads from reads with the same molecular index/tag.
See the list of tools for more detail on the tools
For a full list of available tools please see the tools section of the project website.
Below we highlight a few tools that you may find useful.
- Tools for working with Unique Molecular Indexes (UMIs, aka Molecular IDs or MIDs).
- Annotating/Extract Umis from read-level data:
AnnotateBamWithUmis
andExtractUmisFromBam
. - Tools to manipulate read-level data containing Umis:
CorrectUmis
,GroupReadsByUmi
,CallMolecularConsensusReads
andCallDuplexConsensusReads
- Annotating/Extract Umis from read-level data:
- Tools to manipulate read-level data:
- FastqManipulation:
DemuxFastqs
andFastqToBam
- Filter read-level data:
FilterBam
. - Clipping of reads:
ClipBam
. - Randomize the order of read-level data:
RandomizeBam
. - Update read-level metadata:
SetMateInformation
andUpdateReadGroups
.
- FastqManipulation:
- Quality assessment tools:
- Detailed substitution error rate evaluation:
ErrorRateByReadPosition
- Sample pooling QC:
EstimatePoolingFractions
- Splice-aware insert size QC for RNA-seq libraries:
EstimateRnaSeqInsertSize
- Assessment of duplex sequencing experiments:
CollectDuplexSeqMetrics
- Detailed substitution error rate evaluation:
- Miscellaneous tools:
- Pick molecular indices (ex. sample barcodes, or molecular indexes):
PickIlluminaIndices
andPickLongIndices
. - Convert the output of HAPCUT (a tool for phasing variants):
HapCutToVcf
. - Find technical or synthetic sequences in read-level data:
FindTechnicalReads
. - Assess phased variant calls:
AssessPhasing
.
- Pick molecular indices (ex. sample barcodes, or molecular indexes):
Git LFS is used to store large files used in testing fgbio. In order to compile and run tests it is necessary to install git lfs. To retrieve the large files either:
- Clone the repository after installing git lfs, or
- In a previously cloned repository run
git lfs pull
once
After initial setup regular git commands (e.g. pull
, fetch
, push
) will also operate on large files and no special handling is needed.
To clone the repository: git clone https://github.com/fulcrumgenomics/fgbio.git
fgbio is built using sbt.
Use sbt assembly
to build an executable jar in target/scala-2.11/
.
Tests may be run with sbt test
. R
and ggplot2
are test dependencies.
Java SE 8 is required.
java -jar target/scala-2.12/fgbio-<version>.jar
to see the commands supported. Use java -jar target/scala-2.12/fgbio-<version>.jar <command>
to see the help message for a particular command.
You can include fgbio
in your project using:
"com.fulcrumgenomics" %% "fgbio" % "0.8.0"
for the latest released version or (buyer beware):
"com.fulcrumgenomics" %% "fgbio" % "0.9.0-<commit-hash>-SNAPSHOT"
for the latest development snapshot.
Contributions are welcome and encouraged. We will do our best to provide an initial response to any pull request or issue within one-week. For urgent matters, please contact us directly.
- Tim Fennell (maintainer)
- Nils Homer (maintainer)
fgbio
is open source software released under the MIT License.