/VirKraken

Kraken extension to identify viral contigs

Primary LanguagePythonMIT LicenseMIT

Contributors Forks Stargazers Issues MIT License


Logo

VirKraken

An extension to identify viral elements in Kraken 2 outputs
Report Bug · Request Feature

Table of Contents
  1. About VirKraken
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgements

About VirKraken

This project was created to identify viral contigs using Kraken 2 in the publication Cite. This extension allows a user to get the headers of viral elements and provides the ability to retain only those headers or remove them if the sequencing file is provided.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

VirKraken requires Python 3 and the following libraries (if installling through pip, libraries are automatically install)

  • pandas
  • scikit-learn
  • biopython
  • importlib-resources

Installation

VirKraken is aviliable on PyPI and can be forked on this repository. The easiest way to install VirKraken is to use pip.

pip install virkraken

Usage

VirKraken works as a command line script. Once install via pip, virkraken the command can be accessed. To get the help screen type:

virkraken -h

The paramters of VirKraken are:

  • -f: Kraken output file [required]
  • -c: Seqeuncing file to parse [optional]
  • -r: Remove viral elements flag
  • -o: Rename output files [optional]

Running VirKraken

VirKraken without a sequencing file and renaming the output

virkraken -f Kraken_Output.txt -o Viral_Sequences

The script above will return Viral_Sequences.csv which will contain a column of sequnce headers and NCBI TaxIDs. All returned sequence headers are viral.

VirKraken to filter a seqeunce file

virkraken -f Kraken_Output.txt -c final.contigs.fa -o Viral_Sequences

The script above will return Viral_Sequences.csv and Viral_Sequences.fasta. All returned sequences are viral. VirKraken will filter the input fasta for sequence headers matching the predicted viral headers.

VirKraken to remove viral sequences

virkraken -r -f Kraken_Output.txt -c final.contigs.fa -o Filtered_Sequences

Seqeunce headers that are assigned a viral designation are removed from the resulting fasta file output.

Roadmap

Current Version: 0.0.5

Improvements to be made:

  • Fix .gz contig fasta outfile
  • Allow paired .fastq input
  • Integrate into Kraken 2 codebase

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Cody Glickman - @glickman_Cody - glickman.cody@gmail.com

Project Link: https://github.com/Strong-Lab/VirKraken

Acknowledgements

  • Jo Hendrix