AI-sandbox/gnomix

Not Non-negative probabilities when training the model from scratch

jikhashkya opened this issue · 2 comments

Hi, I was trying to follow the demo and train the model from scratch for 1000G data ( after subsetting 1000 samples) and I get the following error:

Launching in training mode...
Reading vcf file...
Getting genetic map info...
Getting sample map info...
Building founders...
Splitting sample map...
Running Simulation...
Traceback (most recent call last):
  File "gnomix.py", line 392, in <module>
    simulate_splits(base_args, config, data_path) # will create the simulation_output folder
  File "gnomix.py", line 298, in simulate_splits
    return_out=False)
  File "/data/pshakya/COLORPBWT/gnomix/src/laidataset.py", line 410, in simulate
    maternal = admix(founders,founders_weight,gens[i],self.breakpoint_prob,self.num_snps,self.morgans)
  File "/data/pshakya/COLORPBWT/gnomix/src/laidataset.py", line 159, in admix
    p=breakpoint_probability)
  File "mtrand.pyx", line 931, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities are not non-negative

Any ideas on how to fix this ?

hi @jikhashkya, I ran into this same issue. What was causing it for me was when I used liftover to get my genetic map from hg37 to hg38, I introduced instances where position in cM was not increasing monotonically when I sorted my map by physical position. I removed these sections from my genetic map file and was able to successfully train my model.

I believe this section triggers this error when you try and use a genetic map like mine; gnomix gets "interpolated values of all reference snp positions", and then calculates the distance between pairs of SNPs. Because of the structure in my map file, I had "negative" distances here, triggering the "probabilities are not non-negative" error.

Hi @broomej . Thank you for your input on this matter. I haven't had a chance to dig deeper into this issue but interestingly, when I used the same genetic map for 500 samples, it seemed to work fine but it throws an error for 1000 samples. Will definitely look more into this but appreciate your input.