- RepeatModeler2 to identify repeats
- RepeatMasker to mask identified repeats
- Trim raw reads for adapters
- Map with STAR against unmasked reference
- Merge bam
- Use braker2 on merged bam supplying soft-masked genome
- Extract CDS sequence from reference given the braker gtf file using gffread.
- Obtain gilist of Drosophila melanogaster using esearch and efetch.
- Use blastx to blast CDS of genes to Drosophila melanogaster proteins.
- Obtain 'Uniprot accession numbers to gene names' file from flybase (fbgn_NAseq_Uniprot_fb_2020_03.tsv) and edit.
- Check effect of filtering BLAST output using different cut-offs (script: check_blast_output_cutoff.R)
- Filter BLAST file for minimum evalue (<1e-10) and minimum query coverage (>50), percent identity (>35), and merge 'accession to gene names mapping' file (script: add_fbgn_to_blast_strict.R).
- Used Dmerc CDS sequence extracted above, and obtained D. melanogaster CDS sequence (dmel-all-CDS-r6.35_edit.fasta).
- Created blast database with Dmerc CDS.
- Used tBLASTx to blast Dmel CDS against Dmerc CDS.
- Filter BLAST file for minimum evalue (<1e-10) and minimum query coverage (>50), percent identity (>35).
- Edited in R (script: create_tblastx_map_file.R).
- For all blast analyses, max-target seqs was set to the size of the subject database.
- Filter BLAST file as above
- Extract FBgns (from: FBgn_blast_eval1e10_qcovs50_pident35.txt) and use 'Batch Download' tool from FlyBase to add chromosome info.
- Dmerc gene IDs were then combined with FBgns and Dmel chromosomes
- Dmerc genes mapping to multiple FBgns from different chromosomal arms were excluded
- Then counted how many Dmerc genes from each scaffold map to each Dmel chromosome (output: Scaffold_to_Dmel_chromosome_top_hits.txt)
- Run nucmer on repeat masked sequences (MUMmer3.23)
- Reference: dmel-all-chromosome-r6.27.fasta.masked Query: dmerc.ctg.polished1.fa.masked_headerEdit
- Options: --coords --maxgap 500 --maxmatch --prefix Dmerc_Dmel_hardmask_nucmer_maxmatch_maxgap500
- Create coordinates file: show-coords Dmerc_Dmel_hardmask_nucmer_maxmatch_maxgap500.delta -l -c
- Consider Dmerc contig mapping to Dmel chromosome if >40% alignments to one Dmel chromosome, and all other chromosomes have <20% alignments.
- Tried same with D. albomicans, but more confusing because of chromosome fusions, I guess.
- Mapped trimmed reads against Dmerc genome assembly considering the braker.gtf gene annotations.
- Counted reads using featureCounts.
- Decided not to filter genes for a minimum count!
- Used DESeq2 with model: read counts ~ Parthenogenesis Group. Then extracted results for pairwise contrasts.