/duck_comp_gen

Comparative genomics of obligate brood parasite and closely related species in Anatidae

Primary LanguageHTML

Comparative genomics of brood parasitism in the black-headed duck

Authors:

Sara Smith (Assistant Professor, Mount Royal University; ssmith6@mtroyal.ca)

LaDeana Hillier (Department of Genome Sciences, Washington University; lhillier@uw.edu)

Chris Balakrishnan (Associate Professor, East Carolina University; balakrishnanc@ecu.edu)

Wes Warren (Professor, University of Missouri; warrenwc@missouri.edu)

Mike Sorenson (Professor, Boston University; msoren@bu.edu)

Tim Sackton (Director of Bioinformatics, Informatics Group, Harvard University; tsackton@g.harvard.edu)

Genome assembly and comparative genomics project with the freckled duck (Stictonetta naevosa; stiNae), ruddy duck (Oxyura jamaicensis; oxyJam), African pygmy-goose (Nettapus auritus; netAur), and black-headed duck (Heteronetta atricapilla; hetAtr) from 10x data using Supernova assembly.

Code is currently being organized and optimized. Code and select data* related to the following analyses can be found in these directories:

01_assemblies: Genome assemblies using Supernova of 10X data and quality checks with BUSCO
02_wga: Generation of a whole genome alignment of Galloanserae genomes using CACTUS
03_ComparativeAugustus: Generation of genome annotations (both de novo and hinted) with Comparative Augustus and quality checks with BUSCO
03_cnee_analyses: Compilation of conserved non-coding elements from Aves and vertebrates and multiple PhyloAcc (https://phyloacc.github.io/) analyses
04_OrthoFinder: Generation of orthogroups using OrthoFinder (https://github.com/davidemms/OrthoFinder)
04_polytomy_resolution: Resolution of phylogenetic polytomy between the focal species using coding and non-coding sequences
05_CompPopGen: VCF generation and quality checks using snpArcher (https://github.com/harvardinformatics/snpArcher), McDonald-Kreitman tests & SnIPRE (Eilertson et al. 2012) for selection using the framework outlined in https://github.com/sjswuitchik/compPopGen_ms/tree/master/MKpipeline, tests for selection using HyPhy (https://github.com/veg/hyphy), demographic inference using Stairway (Liu & Fu 2020), and identification of selective sweeps using SweepFinder2 (DeGiorgio et al. 2016)

When relevant, each directory in this repository will consist of:

  • the main scripts numbered in increasing order of operations (e.g., 01_run_setup.sh, 02_run_analyses.sh, 03_parse_output.sh)
  • a subscripts directory that contains all the scripts required by the main scripts (e.g., analysis1.py and analysis2.sh used in 02_run_analyses.sh)
  • an outputs directory that contains the outputs from the analyses, where file sizes are not restrictive

* select data available through GitHub where files are compatible with GitHub file size permissions. Genome assemblies are available on NCBI (BioProject PRJNA588796).

The archived_analyses directory contains the initial analyses of chromosome-only assemblies; not current.