https://mlwave.com/kaggle-ensembling-guide/
父本为国内玉米育种界常用的 30 个优良自交系,母本为 207 个具有广泛变异的自交系。
输入用一个30+207 长度的稀疏向量表示一个输入点; 父本编号前30位,哪一系为1,其他为0; 母本编号后207位,哪一系为1,其他为0;
输出位3长度的向量,即三个性状值
- female & male 三个性状数据的曲线和分类
- genotype做成haplotype(基因型点太多,合并块状)
- encoding
- 0( AA:dominant homozygous)
- 1( Aa:heterozygous)
- 2( aa:recessive homozygous)
A machine learning pipeline for quantitative phenotype prediction from genotype data
A Machine Learning Pipeline for Phenotype Prediction from Genotype Data
A Monte Carlo Markov Chain model for predicting quantitative traits from genome-wide SNP data