/MinimalSetofViralPeptidome-UNIQmin

An alignment-independent tool for the study of pathogen sequence diversity

Primary LanguagePythonMIT LicenseMIT

UNIQmin: An alignment-independent tool for the study of viral sequence diversity at any given rank of taxonomy lineage

DOI - 10.3390/biology10090853 Python version PyPI GitHub tag License

Brief Description

Sequence variation among viruses, even of a single amino acid, can expand their host repertoire or enhance the infection ability. Alignment-independent or -free approach represents an alternative to the study of viral diversity, which is devoid of the need for sequence conservation to perform comparative analyses. Herein, we present UNIQmin, a tool that utilises an alignment independent method to generate the minimal set of viral sequences, as a way to study their diversity, across any rank of taxonomic lineage. The minimal set refers to the smallest possible number of sequences required to capture the entire repertoire of viral peptidome diversity present in the given sequence dataset.


Table of Contents

Step-by-step of UNIQmin

UNIQmin comprises of five execution steps, with a Python script for each step. These scripts are provided in the PythonScript folder, accompanied with detailed explanations for the algorithm and execution of each step. The sample input file (exampleinput.fas) and example output file (exampleoutput.fasta) are also provided.

Figure Scheme

uniqminScheme

UNIQmin as a Package

Installation

  • via pip

    pip install uniqmin
    
  • via package clone from GitHub repository

    git clone https://github.com/ChongLC/MinimalSetofViralPeptidome-UNIQmin.git
    

    Note for users who use Conda environment (e.g.: via Jupyter Notebook):
    Before pip install of the package, run

    conda config --add channels conda-forge
    conda install pyahocorasick
    

    ... and restart the kernel to use the updated package. Then, run

    pip install uniqmin
    

Upgrade installed version

pip install uniqmin --upgrade

Usage

uniqmin [-h] [-i INPUT] [-o OUTPUT] [-k KMERLENGTH] [-t THREADS]

A sample usage:
The UNIQmin tool is applied to generate a minimal set (to be saved in an output folder, named "result") for a sample input file (named "exampleinput.fas") with a k-mer window size parameter of nine (9; nonamer) and utilising 4-threads of a CPU (subject to limitations of the resource used):

uniqmin -i exampleinput.fas -o result -k 9 -t 4

Command-line Arguments

Argument Parameter Type Required Default Description
-h help N/A FALSE N/A Show this help message and exit
-i sequence input file String TRUE N/A Path of the input file (in FASTA format)
-o output directory name String TRUE N/A Directory to store the output file to be created
-k k-mer window size Integer FALSE 9 The length of the constituent k-mer to be used
-t number of threads Integer FALSE 4 The number of CPU threads to be used

Citing Resources


Found a bug?

Or would you like a feature to be added? Or maybe drop some feedback? Just open a new issue or send an email to us (lichuinchong@gmail.com).