/macrosyntR

R package for comparison of synteny conservation at the genome-wide scale.

Primary LanguageRGNU General Public License v3.0GPL-3.0

macrosyntR

An R package for evaluation of synteny conservation at the genome-wide scale. It takes a table of orthologs and genome annotation files formatted as BED to automatically infer significantly conserved linkage groups, and order them on an Oxford grid or a chord diagram using a network based greedy algorithm.

License: GPL v3 CRAN downloads bioRxiv:10.1101/2023.01.26.525673


Installation


# A stable version is available on CRAN and can be downloaded using :
install.packages("macrosyntR")
# get the development version from GitHub using devtools :
# install.packages("devtools")
devtools::install_github("SamiLhll/macrosyntR",build_vignettes = TRUE)
# building the vignette makes the installation a bit longer but its mandatory so ou can access it by doing :   
vignette("macrosyntR")

Usage

Check out the vignette for a comprehensive step-by-step tutorial illustrating how the package works using publicly available data, and how to customize the analysis.

Preparing input data :

To start comparing species, you'll need two types of files :

  • 1 - A two columns table of orthologous genes between species to compare as generated by rbhxpress, or derived from OrthoFinder (see Vignette)
  • 2 - A bed file listing the genomic coordinates and sequence names of all the orthologs of all the species (with names matching the columns of file 1)

Get an automatically ordered and colored Oxford grid :

To illustrate the results of the package we compare the publicly available data from the lancelet Branchiostoma floridae (Simakov et al. 2020) with the Siboglinidae Paraescarpia echinospica (Sun et al. 2021)
Once you have your pairs of orthologs, getting an ordered Oxford grid using this package is achieved as following :


library(macrosyntR)

# Load table of orthologs and integrate with genomic coordinates :
my_orthologs_table <- load_orthologs(orthologs_table = system.file("extdata","Bflo_vs_Pech.tab",package="macrosyntR"),
                                     bedfiles = c(system.file("extdata","Bflo.bed",package="macrosyntR"),
                                     system.file("extdata","Pech.bed",package="macrosyntR")))

# Draw an oxford grid :
p1 <- plot_oxford_grid(my_orthologs,
                       sp1_label = "B.floridae",
                       sp2_label = "P.echinospica")
p1

# Automatically reorder the Oxford grid and color the detected clusters (communities):
p2 <- plot_oxford_grid(my_orthologs,
                       sp1_label = "B.floridae",
                       sp2_label = "P.echinospica",
                       reorder = TRUE,
                       color_by = "clust")
p2

# Plot the significant linkage groups :
my_macrosynteny <- compute_macrosynteny(my_orthologs)
p3 <- plot_macrosynteny(my_macrosynteny)
p3


# Call the reordering function, test significance and plot it :
my_orthologs_reordered <- reorder_macrosynteny(my_orthologs)
my_macrosynteny <- compute_macrosynteny(my_orthologs_reordered)
p4 <- plot_macrosynteny(my_macrosynteny)
p4

Compute linkage groups between three species and plot on a chord diagram :

You can have compute the conserved linkage groups for two (or more species) and display them on a chord diagram.
Here, I'm showing how it looks like when adding the data from the scallop Patinopecten yessoensis (Wang et al. (2017)). The content of the orthologs_table is now derived from OrthoFinder (Emms and Kelly (2019)) and the single copy orthologs were extracted with a command line such as :


fgrep -f <path_to_your_orthofinder_run>/Orthogroups/Orthogroups_SingleCopyOrthologues.txt \
<path_to_your_orthofinder_run>/Orthogroups/Orthogroups.tsv > Single_copy_orthologs.tab

# On linux and MacOS, if the result of
file Single_copy_orthologs.tab
# is ASCII text, with CRLF line terminators
# then you should replace the line terminators by regular "\n" with a command such as :
tr  '\015\012/' '\n' < Single_copy_orthologs.tab | awk '($0 != "") {print}' > Single_copy_orthologs.tsv

Then you can draw a chord diagram displaying the conserved linkage groups such as :

# load data 
my_orthologs_with_3_sp <- load_orthologs(orthologs_table = system.file("extdata","Single_copy_orthologs.tsv",package="macrosyntR"),
                                     bedfiles = c(system.file("extdata","Bflo.bed",package="macrosyntR"),
                                                  system.file("extdata","Pech.bed",package="macrosyntR"),
                                                  system.file("extdata","Pyes.bed",package="macrosyntR")))

# Change the chromosome names to keep only numbers
levels(my_orthologs_with_3_sp$sp1.Chr) <- stringr::str_replace(levels(my_orthologs_with_3_sp$sp1.Chr),"BFL","")
levels(my_orthologs_with_3_sp$sp2.Chr) <- stringr::str_replace(levels(my_orthologs_with_3_sp$sp2.Chr),"PEC","")
levels(my_orthologs_with_3_sp$sp3.Chr) <- stringr::str_replace(levels(my_orthologs_with_3_sp$sp3.Chr),"chr","")

# Plot an automatically ordered chord diagram colored by the linkage groups :
plot_chord_diagram(my_orthologs_with_3_sp,
                   species_labels = c("B. flo","P. ech", "P. yes"),
                   color_by = "LGs") +
  theme(legend.position = "none")

# The linkage groups were automatically computed but you can also get them as a table using :
my_linkage_groups <- compute_linkage_groups(my_orthologs_with_3_sp)

Additional ressources

Computing orthologs as reciprocal best hits :

If you don't know how to do it, I implemented rbhXpress. It uses Diamond blast to generate an output compatible with macrosyntR. Please find more details in the following repository : rbhXpress

Getting help

Need help, Identified a bug, or want to see other features implemented ?
Feel free to open an issue here or send an email at :
elhilali.sami@gmail.com

Citation

If used in your research, please cite :

  • El Hilali, S., Copley R., "macrosyntR : Drawing automatically ordered Oxford Grids from standard genomic files in R", bioRxiv (2023). doi:10.1101/2023.01.26.525673