Syngraph
A toolkit for evolutionary analyses of linkage groups
Dependencies
Best addressed via conda
$ conda install -c conda-forge networkx pandas docopt tqdm ete3 pygraphviz
Usage
Usage: syngraph <module> [<args>...] [-D -V -h]
[Modules]
build Build graph from orthology data (e.g. BUSCO *.full_table.tsv)
infer Model rearrangements over a tree
tabulate Get table of extant and ancestral genomes
viz Visualise graph/data [Under development]
[Options]
-h, --help Show this screen.
-D, --debug Print debug information [TBI]
-v, --version Show version
[Dependencies]
---------------------------------------------------------------------------------------------
| $ conda install -c conda-forge networkx=2.4 pandas docopt tqdm ete3 pygraphviz matplotlib |
---------------------------------------------------------------------------------------------
Build a syngraph from BUSCO data, allowing for missingness
syngraph build -d directory_of_tsv_files -m -o test
Model fissions and fusions over a tree, record rearrangements using taxon_1 as a reference
syngraph infer -g test.pickle -t newick.txt -r 2 -s taxon_1 -o test
Model translocations, fissions and fusions over a tree
syngraph infer -g test.pickle -t newick.txt -r 3 -s taxon_1 -o test
Tabulate extant and inferred genomes
syngraph tabulate -g test.with_ancestors.pickle -o test
Input data
Input data should only contain markers from chromosome-scale sequences as unscaffolded contigs will result in excess fission events being inferred.
If using BUSCO data, tsv files should be named My_taxon.\*.tsv
where My_taxon is also a leaf in the newick tree. Each row should contain the BUSCO_ID, sequence, start position, and end position. These can be grepped from the *full_table.tsv
file generated by BUSCO (Busco_id, Sequence, Gene_Start, Gene_End). E.g.:
0at7088 HG995313.1 5723272 5863707
1at7088 HG995286.1 19966914 20084934
2at7088 HG995296.1 11128843 11215510
Inferring rearrangements
After building a syngraph, inter-chromosomal rearrangements can be inferred with syngraph infer
. This requires a newick tree relating the taxa in the analysis. Branch lengths are used by syngraph but this only influences how the tree is traversed, so approximate branch lengths are fine.
The -r
option sets the inference mode, 2
for fissions and fusions, and 3
for fissions, fusions, and reciprocal translocations (which is currently experimental).
The -m
option sets the minimum number of markers that can be involved in a rearrangement. Setting -m 1
will mean that a rearrangement will be reported when a single marker 'moves' between chromosomes. By contrast, setting higher values, e.g. -m 100
, will mean that chromosome fissions or sets of complex rearrangements will be missed. A reasonable starting point is -m 5
although this may need to be adjusted given the density of markers, size of chromosomes, and accuracy of marker orthology.
The most useful output file is *.rearrangements.tsv
. This lists rearrangements inferred over the tree. The branch of the tree where a rearrangement happened is denoted by its parent and child nodes. The event is reported as fission/fusion/translocation. Multiplicity is the number of events. This is normally 1, but can be more if a chromosome has fissioned into mutliple fragements. The last column is ref_seqs, and shows which chromosomes are involved in the rearrangement given an extant genome, an inferred ancestral genome, or a predefined list of marker --> chromosome relationships.
#parent child event multiplicity ref_seqs
n7 Brenthis_ino fusion 1 [['n5_2', 'n5_17'], ['n5_20']]
n5 n7 fusion 1 [['n5_6'], ['n5_19']]
Help
Syngraph is still under active development. Please open an issue if you have any questions about running the software or interpreting your results.