Reduce dataset by filtering on snps using sequence reduction and alternate transcripts. Construct 3-ples of the expressed, nonexpressed, and sibling genotype. Clustered 3-ples using K-means clustering to identify mutation sites with Lloyd's algorithm and K++ initialization and ranked mutations based on centroid distance and cluster size.