/phylloglossum_editing

Open access data and methodology for detecting RNA editing events in the chloroplast and mitochondria of the Western Australian lycophyte Phylloglossum drummondii.

Primary LanguageJupyter Notebook

Phylloglossum RNA Editing

Open access data and methodology for detecting RNA editing events in the chloroplast and mitochondria of the Western Australian lycophyte Phylloglossum drummondii.

This repository contains the code used to create figures presented in The Plant Journal article "Insights into U-to-C editing from the lycophyte Phylloglossum drummondii", as well as code for identifying RNA editing sites within chloroplast and mitochondrion transcripts using the Julia program 'pyrimid', and associated RNA editing Jupyter notebooks which operate with a Julia kernel.

Data Availability

DNAseq and RNAseq datasets are available in the Sequence Read Archive as accession PRJNA818771.

The chloroplast genome of Phylloglossum drummondii is available in GenBank as accession OR992133.

The mitochondrial genome of Phylloglossum drummondii is available in GenBank as accession PP024676.

Chloroplast genome assembly and annotation

Trim DNA reads using BBDuk from the BBtools suite. Settings: ktrim=r k=23 mink=11 hdist=1 ftm=5 tpe tbo.

Use trimmed reads to assemble the Phylloglossum drummondii chloroplast genome using NOVOPlasty. Use a related species chloroplast sequence as a seed, for example, Huperzia serrata rbcL. Config file settings: type=chloro; genome range=120000-220000bp; kmer=51.

Optionally, you can verify and attempt to improve the assembly using Pilon. We found that no Pilon made no improvements to the assembly options generated by NOVOPlasty.

Preliminary annotation of the chloroplast genome can be achieved using Chloë, however, Chloë is optimised for angiosperms and will not currently produce accurate annotations for lycophytes. Annotations for the Phylloglossum drummondii plastome were generated with a beta version of Chloë which included lycophyte and monilophyte reference sequences.

Mitochondrial genome assembly and annotation

Mitochondrial assembly can be achieved by combining NOVOPlasty, SPAdes and Geneious Prime.

NOVOPlasty is capable of assembling non-contiguous mitochondrial assemblies using Settings type=mito_plant; genome range=300000-550000bp; kmer=23. We supplied the complete mitochondrial genome of Phlegmariurus squarrosus as a seed input file, and our own assembly of the Phylloglossum drummondii plastome as a secondary reference in the NOVOPlasty .config file.

SPAdes was used to assemble the mitochondrial genome using Settings --cov-cutoff 50.0 --assembler-only --careful.

The SPAdes output FastG graph was visualised in Bandage and mitochondrial genes were identified by blastn using sequences from Phlegmariurus squarrosus as queries.

DNA reads were mapped to each mitochondrial assembly option using bbmap.sh from the BBtools suite, and the mapped reads (outm=reads.fq.gz) were used as input for the Geneious de novo assembly algorithm. Ultimately, the SPAdes assembly version was used as the basis for the final assembly using connections suggested by Geneious and/or NOVOPlasty to manually edit the FastG file and eliminate dead-ends. Pilon was used to verify the final assembly. The path chosen for the ‘master circle’ view of the mitochondrial genome is only one of many possible arrangements of the assembly.

Annotations from Phlegmariurus squarrosus were extracted using Geneious Prime and mapped to the Phylloglossum mitochondrial genome assembly. You can also use Geneious' "annotate from" function to quickly map gene annotations from related species onto de novo assemblies. In cases where introns contained large insertions or deletions, mapping individual exons accurately identified intron boundaries, which were later verified with RNAseq data.

tRNAscan-SE was used to check for missed tRNA gene annotations, and identified tRNAs were then checked against the PlantRNA2.0 database.

All mitochondrial genes were manually curated using RNA editing events as a reference to ensure start codon creation, stop codon creation and premature stop codon removal events were accounted for.

We subsequently mapped annotations from Phylloglossum drummondii to the mitochondrion genomes of Huperzia crispata and Phlegmariurus squarrosus to compare annotation start coordinates, end coordinates and gene counts. We have corrected several annotations in the Huperzia and Phlegmariurus squarrosus mitochondrion genomes. Please contact us if you would like access to these updated annotations.

Detection of RNA editing events

Merge trimmed RNAseq datasets using bbmerge with Settings qtrim2=t, trimq=10,15,20, minq=12.

Map the merged and unmerged reads to the organelle genome assemblies using bbwrap.sh from the BBtools suite. Settings mappedonly=t ambiguous=random.

Nucleotide count files for each position in the organelle genomes were generated for each RNA-seq dataset and the DNAseq dataset using the version of Pyrimid available through this repository using settings –m 0, -u.

Nucleotide count files can be converted to RNA editing tables using the Jupyter notebooks present in this repository. In short, the notebooks require .fasta and .gff files for the organelle assemblies, and the output from pyrimid (tab separated nucleotide count file). The workflow presented in the Jupyter notebooks require a Julia kernel. To install this, follow instructions for installing IJulia. The notebooks will mask rRNA and tRNAs, and perform binomial and Fisher's exact test on pyrimidine mismatches within the nucleotide count file. Sites passing these tests will be called as RNA editing events, and the statistics will be summarised in the resulting .tsv file.

De novo transcriptome assembly

Use trimmed RNA reads to build a de novo transcriptome assembly for Phylloglossum drummondii using rnaSPAdes run with default settings.

We assembled transcriptomes for Phylloglossum drummondii, and also Phlegmariurus squarrosus and Huperzia serrata. RNA-seq data for Huperzia serrata is available under accession number PRJCA000351 and Phlegmariurus squarrosus is available here.

Identification of PPR proteins

Open reading frames in the Phylloglossum drummondii, Phlegmariurus squarrosus and Huperzia serrata transcriptomes were translated in forward and reverse orientations and in all six reading frames using the version of orfinder.jl available in this repository.

Use “hmmsearch” from HMMER v3.2.1 was to search for PPR motifs in the generated open reading frames .fasta file from the transcriptome assemblies using the DYW and DYW:KP motif Hidden Markov Model (HMM) profile "all_KP.hmm" present in this repository, and originally described in (Gutmann et al., 2020).

Use PPRfinder_vApr19 (present in this repository) to identify PPR proteins from the open reading frames with the output of hmmsearch.

Phylogenetic analysis of DYW:KP proteins

You should find that there are four DYW:KP sequences in the Huperzia serrata and Phylloglossum drummondii PPR .beads file, and three sequences in the Phlegmariurus squarrosus beads file.

Aligning these 11 DYW:KP domain sequences with MUSCLE v5.1 or MAFFT will provide an alignment file which can be converted to a codon alignment using nt2codon.jl present in this repository.

A maximum likelihood phylogenetic tree constructed using IQTREE 2 v2.1.4 and the inbuilt model finder will produce a phylogeny of the DYW:KP domains.

U-to-C PPR editing factor target prediction

The longest KP1, KP2, KP3 and KP4 sequences from Phylloglossum drummondii, Huperzia serrata and Phlegmariurus squarrosus can be matched to nucleotide sequences upstream of mitochondrial U-to-C editing sites for PPRmatcher originally described in (Royan et al., 2021).

Acknowledgements

This work was supported by Australian Research Council grant DP200102981. The authors have no conflicts of interest to declare. We are grateful to Volker Knoop and Simon Zumkeller for discussions and advice concerning lycophyte organelle gene structure and splicing.