chr1swallace/snphap

how to generate legal input file with “nucleotide” coding

Opened this issue · 2 comments

I am doing a simulation and I supposed to generate 600 individuals' haplotype. Each haplotype contains 3 SNPs, we know that for each SNP, I have two options: A/T or G/C. I was wondering how many letters a subject identifier should be followed by, 3, or 6? Assume the haplotype of individual 1 is (A/T)(A/T)(G/C), and if the answer is 3 and , which 3 letters should I input? AAG/ATC or ect? or I just do it randomly? and if the answer is 6, should I input ATATGC?
I am asking this because I have tried all the circumstance but none was right and I get really confused. The error information when I input 3 SNP is: should be an id plus two fields for each locus. As for the error information when I input 5 SNP as above, it says: Data error on locus XXX, skipping this subject XXX.
I was wondering if I can get any assistance, thank you so much!