BoPeng/simuPOP-examples

Testing a broad range of parameters with SimuPOP

ChrystelleDelord opened this issue · 3 comments

Hi Dr Peng and every SimuPOP user!

I am just started with SimuPOP (and Python) so I have many, many things to learn and I hope this is the right place to ask questions (maybe I should use the mailing list instead, but if I understood well it is better here?)

For my project, I need to simulate a population consisting of several subpopulations arranged in a 'dendritic' (river-network) design. Then I will be testing a wide range of parameters (different migration rates, subPopulation sizes, degree of asymmetry in migration - downstream versus upstream directed, etc) and many different combinations of these. Also, I would like to run several replicates per combination.

I am not sure what would be the most efficient way to perform (it might be a really silly question, if so I am very sorry about that!). Should I just pass my parameters within an imbricated for loop? Or is there a more straighforward way to go?

Any help will be welcome, thank you so much!

Best,

Chrystelle

This is an interesting question and I will just write down some suggestions as my mind flows

  1. First, make sure your simulation works correctly. This can be difficult to do but sometimes there are theoretical estimates for simple corner cases or there are existing well-known trend that can be used to verify the results. Also, as simuPOP is completely open, you can dump all individuals after an operation that you are not completely sure and check manually. I mean, the last thing you want to see is some bug in the code at the end of the journey.

  2. Make sure your code is efficient (to some degree). Try to use optimized module (setOptions(optimized=True)), try to use binary or mutant modules if they happen to fit your need, profile your code and check if you running some unnecessary loop etc, and check if you can get reasonable results from populations of size 10,000 instead of 100,000. Small savings add up for large scale simulations.

  3. Measure the time for each simulation and estimate the total parameter space and total simulation time. Cut corners even trunks (reduce population size, make parameter space more coarse for some parameters, explore each parameter one by one to remove uninteresting ranges) if you cannot afford the time and/or cost for CPU hours.

  4. Divide your simulations into thousands of jobs and send them to a cluster system with a python or bash script. If each simulation takes just a few minutes, try to combine some of them together (e.g. combine simulations from an inner loop) to make each "simulation" an appropriate size to be sent to a cluster system because there can be high cost in maintaining large amount of small jobs on such systems

  5. Wait and be proactive in checking partial results to make sure things are running correctly...

Thank you so much!

I'll try this step by step, and I'll keep this thread updated in case it could be of interest to anyone else. I might also come back to ask further questions but I hope not to disturb too much!

All the best,

Chrystelle

Hi again all,

Just to keep this thread updated:
I have been testing a small-scaled simulation load to get an idea of the duration time of the process. Each combination of subPopSize=[30, 50, 100, 1000] (*16 subPops at total) and migration_rate=[0.01, 0.05, 0.1, 0.2, 0.3] , along with 100 independent replicates, would need less than 2 hours to run on a 4-cores, 16 Go of RAM CPU. In my particular case I did not need to test for a broader parameter range, however I am now using the outputs to choose a small combination of subPopSizes and migration rates and add an additional parameters, and so on.

I have attached my code enclosed (a rather simple one). It might not be pretty well optimized since I am quite new to Python but it works and gives me what I was expecting for (I am simulating genetic variation on SNP markers in a dendritic-shaped metapopulation - a watershed. Ultimately I would like to add some fishing pressure in this and test for different scenarii related to species biological characteristics.)

TestMb_200gen.txt

)

Best!

Chrys