Could you allow non-uniform recombination rate in slendr?
dangliu opened this issue · 10 comments
Hi Martin,
It's me again ^^'
As title, could you allow one to simulate sequences with non-uniform recombination rates? And maybe allow one to use the rates from available data, such as HapMap, like the RateMap.read_hapmap() function in msprime (https://tskit.dev/msprime/docs/stable/api.html#msprime.RateMaphttps://tskit.dev/msprime/docs/stable/api.html#msprime.RateMap)?
Thank you again.
Dang
Hi Dang,
Thanks for the suggestion. This is a very useful feature that I've had in the back of my mind since the beginning yet never got to implementing it. Mostly because all of the slendr work that we've been doing in the group was exploratory and/or theoretical so uniform rate was all that's been needed.
Thanks for the link to the msprime example. I will figure out a way to implement that in the msprime()
function. Perhaps I could simply re-use the recombination_rate =
parameter to detect either a single number (uniform rate) or the source of the recombination map following what's here. I guess the same thing would apply to the slim()
runner function -- I seem to remember a recipe in the SLiM manual that shows exactly that.
I don't think it will make it into the next slendr version I'm intending to submit to CRAN soon (and the paper) but I'll keep this in mind whenever I have time during the summer.
Just to open the black box a little bit, @dangliu: If you need this quickly and are familiar with msprime a bit, you could try to implement this into the msprime slendr script that's saved into every slendr model config directory. There's no magic. When you run msprime()
in slendr, R simply calls that msprime script under the hood.
This script is basically a version of this. If you would implement the code for recombination maps into that msprime script, I should be able to merge this into the slendr package itself fairly easily.
Just in case you'd like to do a bit of coding for a good cause. ;)
Hey folks (also pinging @archgen) -- do you have a source of an empirical recombination map that could be used for testing?
Having never done this (all of my simulation work in the past was for theoretical popgen purposes), I'm confused as to what fileformat or source of data people might use for this.
Hey folks (also pinging @archgen) -- do you have a source of an empirical recombination map that could be used for testing?
Section 6.1.2 of the SLiM manual has an example of this, with a recombination map for Drosophila melanogaster from the literature. I don't know if there is a standard file format for these, but it at least shows the file format that that particular map was published in.
Thanks for chiming in, Ben! Indeed, the SLiM manual is my only source of information on this (it's recombination map example being the one thing I used in my own work in the past). I would probably start with this.
The question is, how much diversity of file formats one might expect (I suspect a lot, given that this is dealing with bioinformatics). I guess I could simply say "the recombination map must be formatted in a way compatible with Section 6.1.2 of the SLiM manual" and be done with it. 🤔 File conversion is something that should be done by the user anyway. I'm certainly not planning to implement multiple different sources of the data.
I'll keep thinking for a bit longer before I write any code. Something I have been learning the hard way. :)
Hi Martin, I have used the GRCh37 genetic map lifted over from the HapMap Phase II map (filtered) (source: Adam's repo: https://github.com/adimitromanolakis/geneticMap-GRCh37). I've passed this to msprime using the RateMap.read_hapmap function and modified my input GRCH37 ~ HapMap map by concatenating some chromosomes, setting 0.5 recombination rate between them and making sure the input rate to msp is in cM/Mb. See here for msprime discussion https://tskit.dev/msprime/docs/stable/rate_maps.html. I haven't had a chance yet to play with your source code to pass this to slendr. Perhaps it might be easiest to offer the possibility of users generating their own custom rate maps (through the ms functionality) and taking that as input for slendr rather than trying to think about all the various file formats that people might be using (I'm not sure what other species or users might want/need). Hope this helps a little.
Nice – I was not aware that msprime had a defined file format that it supports for reading rate maps. I'll look into that and consider supporting it in SLiM as well.
Too tired to read it now but this looks like something I should take a look at so I'm "bookmarking" it here. You know, for the time when the planets align and I finally start implementing this. :)
Hi @bodkan. Noticed your comment above. FYI, I looked at the file format that msprime supports for reading recombination maps, and it seems rather complicated, so I decided not to go there for the time being. Anyway it isn't hard to write Eidos code to read a simple map in from a file; it's gotten even easier with the addition of readCSV()
, in fact. But the other comment I'd make is: now that slendr has kind of crystallized to some extent, I think you should take a step back and do some thinking about how to deal with all the features people will want. SLiM can do a wide variety of things, of course; variable recombination maps is just the beginning. Rather than wrapping each feature of SLiM in an R-ish wrapper in slendr – a task which would end up ballooning the complexity of slendr tremendously, and keep you busy for years to come! – I think you need to look at feature requests like this and instead say to yourself "How can I open things up so that people can plug in their own SLiM tweaks on top of what slendr provides?" I know you've thought about that some already; I guess what I'm saying is, you may want to draw the line more or less right here, right now, and say "Rather than providing an API for variable recombination maps, now is the time to add the hooks that will allow people to do this, and many many other things, for themselves." Food for thought? Anyhow, congrats on 0.4.0, and I'm looking forward to the paper! :->