The NucleoNets: Deep Polygenic Neural Network for Predicting and Identifying Yield-asscociated Genes in Indonesian Rice Accessions
- Contributors: Nicholas Dominic, Tjeng Wawan Cenggoro, Arif Budiarto, Bens Pardamean
- Main Affiliation: Bioinformatics & Data Science Research Center (BDSRC), Bina Nusantara University
- Programming Language: Python 3.7.1
- International Publication: (coming soon)
- Can I copy all or some of your works?
>> You can, by specifying author's name as condition. Or simply do GitHub's fork from my repository. - I can't open your .ipynb file, how to resolve the issue?
>> Copy the url (with .ipynb extension) and paste it to this website. - May I commit a change in your code?
>> You may. But only reliable changes will acquire my acceptance, as soon as possible.
ABSTRACT
As the fourth most populous country in the world, Indonesia must increase the annual rice production rate to achieve national food security by 2050. One solution comes from the nanoscopic level: a genetic variant called Single Nucleotide Polymorphism (SNP), which can express significant yield-associated genes. The prior benchmark of this study utilized a statistical genetics model where no SNP position information and attention mechanism were involved. Hence, we developed a novel deep polygenic neural network, named the NucleoNet model, to address these obstacles. In the first phase of our methodology, a Genotype–Phenotype table was established. The second phase, an Ordinary Least Squares (OLS) model with Elastic Net shrinkage prior was developed to perform marginal and polygenic regression as commonly done in Genome-wide Association Study (GWAS). The third phase, the NucleoNet model was constructed with the combination of prominent components that include positional SNP encoding, the context vector, wide models, Elastic Net regularization, and Shannon’s entropy loss. At the final phase, we did an evaluation and found that the NucleoNet model obtained up to 2.779 of Mean Squared Error (MSE) with 47.156% of Symmetric Mean Absolute Percentage Error (SMAPE). As we conducted the ablation study, we learned that the combination of Xavier distribution for weights initialization and Normal distribution for biases initialization sparked more various significant SNPs throughout 12 chromosomes. By maintaining this setting, the polygenic modeling reduced the MSE score up to 32.28% compared to the OLS model and revealed 15 new rice yield-associated SNPs. Our findings confirmed that the NucleoNet model was successfully outperformed the OLS model and identified important SNPs to Indonesian rice yields.
NICHOLAS DOMINIC
- Education: Graduate Student in BINUS University Computer Science (AI stream)
- Email: nicholas.dominic@binus.ac.id / dominicnick4@gmail.com / ndominic75@icloud.com
- LinkedIn Profile: click here
Do me a favor to share my works and freely contact me for further recognition. Have a great rest of your day!