/annonex2embl

Converts an annotated DNA multi-sequence alignment (in NEXUS format) to an EMBL flatfile for submission to ENA

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

annonex2embl

Build Status PyPI status PyPI pyversions PyPI version shields.io PyPI license

Converts an annotated DNA multi-sequence alignment (in NEXUS format) to an EMBL flatfile for submission to ENA via the Webin-CLI submission tool.

INSTALLATION

To get the most recent stable version of annonex2embl, run:

pip install annonex2embl

Or, alternatively, if you want to get the latest development version of annonex2embl, run:

pip install git+https://github.com/michaelgruenstaeudl/annonex2embl.git

INPUT, OUTPUT AND PREREQUISITES

  • Input: an annotated DNA multiple sequence alignment in NEXUS format; and a comma-delimited (CSV) metadata table
  • Output: a submission-ready, multi-record EMBL flatfile

Requirements / Input preparation

The annotations of a NEXUS file are specified via SETS-block, which is located beneath a DATA-block and defines sets of characters in the DNA alignment. In such a SETS-block, every gene and every exon charset must be accompanied by one CDS charset. Other charsets can be defined unaccompanied.

Example of a complete SETS-BLOCK

BEGIN SETS;
CHARSET matK_gene_forward = 929-2530;
CHARSET matK_CDS_forward = 929-2530;
CHARSET trnK_intron_forward = 1-928 2531-2813;
END;

Examples of corresponding DESCR variable

DESCR="tRNA-Lys (trnK) intron, partial sequence; maturase K (matK) gene, complete sequence"

EXAMPLE USAGE

cd into the annonex2embl package, then ...

On Linux / MacOS

SCRPT=$PWD/scripts/annonex2embl_launcher_CLI.py
INPUT=$PWD/examples/input/TestData1.nex
METAD=$PWD/examples/input/Metadata.csv
mkdir -p $PWD/examples/temp/
OTPUT=$PWD/examples/temp/TestData1.embl
DESCR='description of alignment here'  # Do not use double-quotes
EMAIL=your_email_here@yourmailserver.com
AUTHR='your name here'  # Do not use double-quotes
MNFTS=PRJEB00000
MNFTD=${DESCR//[^[:alnum:]]/_}

python3 $SCRPT -n $INPUT -c $METAD -d "$DESCR" -e $EMAIL -a "$AUTHR" -o $OTPUT --qualifiername "note" --productlookup --manifeststudy $MNFTS --manifestdescr $MNFTD --compress

On Windows

SET SCRPT=$PWD\scripts\annonex2embl_launcher_CLI.py
SET INPUT=$PWD\examples\input\TestData1.nex
SET METAD=$PWD\examples\input\Metadata.csv
mkdir $PWD\examples\temp\
SET OTPUT=$PWD\examples\temp\TestData1.embl
SET DESCR='description of alignment here'
SET EMAIL=your_email_here@yourmailserver.com
SET AUTHR='your name here'
SET MNFTS=PRJEB00000
SET MNFTD=a_unique_description_here

python %SCRPT% -n %INPUT% -c %METAD% -d %DESCR% -e %EMAIL% -a %AUTHR% -o %OTPUT% --productlookup --manifeststudy %MNFTS% --manifestdescr %MNFTD% --compress

CHANGELOG

See CHANGELOG.md for a list of recent changes to the software.