/AgAdapt

Multimodal data fusion for maize phenotype prediction across environments

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Multimodal Data Fusion for Maize Phenotype Prediction across Environments


Summary

The AgAdapt algorithm aims to provide multimodal phenotype prediction while using the minimum number of predictor features possible.

A challenging problem in biology is incorporating large-scale data from multiple sources into machine learning models to predict organism traits. We employ deep-learning dimensionality reduction techniques for condensing large data into meaningful predictor variables. Models are then trained using a gradient-boosting regression approach.

Our AgAdapt algorithm can serve as a tool for efficient crop production and breeding.

References

General References

[1] McFarland, B.A., AlKhalifah, N., Bohn, M. et al. Maize genomes to fields (G2F): 2014–2017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets. BMC Res Notes 13, 71 (2020). https://doi.org/10.1186/s13104-020-4922-8

[2] C J Battey, Gabrielle C Coffing, Andrew D Kern, Visualizing population structure with variational autoencoders, G3 Genes|Genomes|Genetics, Volume 11, Issue 1, January 2021, jkaa036, https://doi.org/10.1093/g3journal/jkaa036

Software and Packages

[1] Peter J. Bradbury, Zhiwu Zhang, Dallas E. Kroon, Terry M. Casstevens, Yogesh Ramdoss, Edward S. Buckler, TASSEL: software for association mapping of complex traits in diverse samples, Bioinformatics, Volume 23, Issue 19, 1 October 2007, Pages 2633–2635, https://doi.org/10.1093/bioinformatics/btm308

[2] Jombart T (2008). adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics, 24, 1403-1405. https://doi.org/10.1093/bioinformatics/btn129

[3] Knaus, B.J. and Grünwald, N.J. (2017), vcfR: a package to manipulate and visualize variant call format data in R. Mol Ecol Resour, 17: 44-53. https://doi.org/10.1111/1755-0998.12549