/Genome2OR

Annotate Olfactory receptor CDS from genome

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Genome2OR

Annotate Olfactory receptor CDS from genome

Contents Index

Cite

  • Cite us
    Welcome to our paper.
    Han W, Wu Y, Zeng L, Zhao S. Building the Chordata Olfactory Receptor Database using more than 400,000 receptors annotated by Genome2OR. Sci China Life Sci. 2022 Dec;65(12):2539-2551. doi: 10.1007/s11427-021-2081-6. Epub 2022 Jun 10. PMID: 35696018.
    Article
    Support information

  • Resource
    Welcome to "Chordata Olfactory Receptor Database(CORD)". You will be able to obtain a large amount of data on olfactory receptors in chordata.

Abstract

Genome2OR is a genetic annotation tool based on HMMER, MAFFT and CD-HIT.
HMMER searches biological sequence databases for homologous sequences, using either single sequences or multiple sequence alignments as queries. HMMER implements a technology called "profile hidden Markov models (profile HMMs)".

MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.

CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

Installation

Use anaconda for installation.

conda install -c whanwei genome2or

After installation you can use the following two commands to annotate the olfactory receptor genes in the genome.

Basic annotation method

genome2or Actinopteri outputdir genome.fasta -e 1e-10 -c 4 -p prefix

For more information.

genome2or --help

    ____ _____ _   _  ___  __  __ _____ ____   ___  ____
   / ___| ____| \ | |/ _ \|  \/  | ____|___ \ / _ \|  _ \
  | |  _|  _| |  \| | | | | |\/| |  _|   __) | | | | |_) |
  | |_| | |___| |\  | |_| | |  | | |___ / __/| |_| |  _ <
   \____|_____|_| \_|\___/|_|  |_|_____|_____|\___/|_| \_|


usage: genome2or.py [-h] [-e] [-l] [-c] [-p] [-k] [-v] [-V] profile outputdir genome

Annotating Olfactory Receptor Genes in Vertebrate Genomes in One Step.

positional arguments:
  profile               Select an HMM profile for the annotated species from the following options: "Actinopteri", "Amphibia", "Aves", "Branchiostoma_floridae",
                        "Chondrichthyes", "Cladistia", "Coelacanthimorpha", "Crocodylia", "Hyperoartia", "Lepidosauria", "Mammalia", "Myxini", "Reptiles", "Testudines".
                        Alternativa, provide the path to a HMM profile file. However, we do not generally recommend doing so unless there is no corresponding option for the
                        species you need to annotate in the list we provide.
  outputdir             String. Directory path where the output files are stored.[default:Current directory]
  genome                String. File path of the genome to be annotated.

optional arguments:
  -h, --help            show this help message and exit
  -e , --EvalueLimit    Float, The e-value threashold used for extract olfactory receptor gene fragment(s) from the genome.[default:1e-20]
  -l , --SeqLengthLimit
                        Integer. Threshold of sequence length, sequences shoter than this value will not be considered as the preferred targets for functional olfactory
                        receptors.[default:868]
  -c , --cpus           Integer. number of parallel, with 0, all CPUs will be used.[default='2/3 of all cores']
  -p , --prefix         String. Output file name prefix.[default:ORannotation]
  -k , --keepfile       Bool. whether to keep detailed intermediate file(True/False).[default:True]
  -v, --verbose         Print detailed running messages of the program.
  -V, --version         Show version message and exit.

https://link.springer.com/article/10.1007/s11427-021-2081-6

Iterative annotation of the genome

Iteration Actinopteri outputdir genome.fasta -i 3 -e 1e-10 -c 4 -p prefix

For more information.

Iteration --help

     ___ _____ _____ ____      _  _____ ___ ___  _   _
    |_ _|_   _| ____|  _ \    / \|_   _|_ _/ _ \| \ | |
     | |  | | |  _| | |_) |  / _ \ | |  | | | | |  \| |
     | |  | | | |___|  _ <  / ___ \| |  | | |_| | |\  |
    |___| |_| |_____|_| \_\/_/   \_\_| |___\___/|_| \_|


usage: Iteration.py [-h] [-i] [-e] [-l] [-c] [-p] [-k] [-v] [-V] profile outputdir genome

Iterative annotation of olfactory receptor genes in the genome.

positional arguments:
  profile               Select an HMM profile for the annotated species from the following options: "Actinopteri", "Amphibia", "Aves", "Branchiostoma_floridae",
                        "Chondrichthyes", "Cladistia", "Coelacanthimorpha", "Crocodylia", "Hyperoartia", "Lepidosauria", "Mammalia", "Myxini", "Reptiles", "Testudines".
                        Alternativa, provide the path to a HMM profile file. However, we do not generally recommend doing so unless there is no corresponding option for the
                        species you need to annotate in the list we provide.
  outputdir             String. Directory path where the output files are stored.[default:Current directory]
  genome                String. File path of the genome to be annotated.

optional arguments:
  -h, --help            show this help message and exit
  -i , --iteration      Int. Number of iterations.[default:2]
  -e , --EvalueLimit    Float, The e-value threashold used for extract olfactory receptor gene fragment(s) from the genome.[default:1e-20]
  -l , --SeqLengthLimit
                        Integer. Threshold of sequence length, sequences shoter than this value will not be considered as the preferred targets for functional olfactory
                        receptors.[default:868]
  -c , --cpus           Integer. number of parallel, with 0, all CPUs will be used.[default='2/3 of all cores']
  -p , --prefix         String. Output file name prefix.[default:ORannotation]
  -k , --keepfile       Bool. whether to keep detailed intermediate file(True/False).[default:True]
  -v, --verbose         Print detailed running messages of the program.
  -V, --version         Show version message and exit.

https://link.springer.com/article/10.1007/s11427-021-2081-6

Batch annotation

For batch annotation, you can use the following simple shell script to achieve it.

GenomeDir="Path to the directory where you store your genome"
for genome in `ls $GenomeDir`; do
	genome2or Actinopteri outputdir $GenomeDir/$genome -e 1e-10 -c 4 -p ${genome%.*}
done

Here we assume that you are annotating species of the "Actinopteri" and that the genomes stored in your catalog all belong to the "Actinopteri". For annotation of species of other orders, please select the corresponding HMM profile with the "profile" parameter.

Or you want to use iterations for batch annotation.

GenomeDir="Path to the directory where you store your genome"
for genome in `ls $GenomeDir`; do
	Iteration Actinopteri outputdir $GenomeDir/$genome -i 3 -e 1e-10 -c 4 -p ${genome%.*}
done

Contact us