/MirMachine

MirMachine, a command line tool to detect microRNA homologs in genome sequences.

Primary LanguagePythonMIT LicenseMIT

MirMachine

Build Status Documentation Status

Project Status: Active – The project has reached a stable, usable state and is being actively developed. PyPI version Anaconda-Server Badge Anaconda-Server Badge
Anaconda-Server Badge Docker Pulls
A command line tool to detect miRNA homologs in genome sequences.

Installation

To install this package with conda run:

conda install mirmachine -c bioconda -c conda-forge

Please add conda-forge as a channel. Installing via mamba is also strongly recommended for a faster installation. You can install mamba and later MirMachine like this:

conda install mamba -c conda-forge
mamba install mirmachine -c bioconda -c conda-forge

Check if the installation works by calling the main script.

MirMachine.py --help

Note: You have to install dependencies if you prefer GitHub or PyPi installation.

A warning for Apple Silicon users (e.g. M1 or M2): bedtools depedency is not available for arm64 architecture. You have to set your environment to osx-64. You can install like this, which create a new environment and will install MirMachine:

CONDA_SUBDIR=osx-64 mamba create -n mirmachine -c conda-forge -c bioconda mirmachine

Quick start example

Create a new directory and run MirMachine there after the installation. MirMachine will create the required directories while running.

MirMachine.py -n Caenorhabditis -s Caenorhabditis_elegans --genome sample/genomes/ce11.fa --cpu 20

See our documentation for detailed explanations: https://mirmachine.readthedocs.io/

Options and Arguments

Usage:
    MirMachine.py --node <text> --species <text> --genome <text> [--model <text>] [--cpu <integer>] [--add-all-nodes|--single-node-only] [--unlock|--remove] [--dry]
    MirMachine.py --species <text> --genome <text> --family <text> [--model <text>] [--unlock|--remove] [--dry]
    MirMachine.py --node <text> [--add-all-nodes]
    MirMachine.py --print-all-nodes
    MirMachine.py --print-all-families
    MirMachine.py --print-ascii-tree
    MirMachine.py (-h | --help)
    MirMachine.py --version

Arguments:
    -n <text>, --node <text>              Node name. (e.g. Caenorhabditis)
    -s <text>, --species <text>           Species name. (e.g. Caenorhabditis_elegans)
    -g <text>, --genome <text>            Genome fasta file location (e.g. data/genome/example.fasta)
    -m <text>, --model <text>             Model type: deutero, proto, combined [default: combined]
    -f <text>, --family <text>            Run only a single miRNA family (e.g. Let-7).
    -c <integer>, --cpu <integer>         CPUs. [default: 2]

Options:
    -a, --add-all-nodes                 Move on the tree both ways.
    -o, --single-node-only              Run only on the given node for miRNA families.
    -p, --print-all-nodes               Print all available node options and exit.
    -l, --print-all-families            Print all available families in this version and exit.
    -t, --print-ascii-tree              Print ascii tree of the tree file.
    -u, --unlock                        Rescue stalled jobs (Try this if the previous job ended prematurely).
    -r, --remove                        Clear all output files (this won't remove input files).
    -d, --dry                           Dry run.
    -h, --help                          Show this screen.
    --version                           Show version.

Output

The MirMachine main executable will generate GFF annotations (filtered and unfiltered) and some other files. You will see results/predictions/ directory which contains:

gff/ All predicted miRNA families.
filtered_gff/ High confidence miRNA family predictions after bitscore filtering. (This file is what you need in most cases)
fasta/ Both high and low confidence predictions in FASTA format.

MirMachine's other repos

Web application repo: https://github.com/selfjell/MirMachine
Supplementary files repo: https://github.com/sinanugur/MirMachine-supplementary

Citiation

Our Cell Genomics paper is here: https://doi.org/10.1016/j.xgen.2023.100348 Please cite if you find our tool useful.