Generate data with correlations
jknowles opened this issue · 6 comments
I started work awhile ago on a much less ambitious project than wakefield
to attempt to generate random data sets on the fly with a known correlation structure. You can see the seeds of that work here: https://github.com/jknowles/datasynthR
It would be cool to include the ability to generate numeric or factor data with a known correlation structure to build structural relationships into the very realistic looking data generated by wakefield
.
It seems that you've done a lot of work on this already. This is pretty nice. After looking at what you have, replicating what you have is needlessly redundant.
Is there a way you could continue to develop datasynthR with the end goal ability to incorporate functionality into wakefield or as a stand alone package. Do you plan to make this a CRAN package? I'd like to see a relationship between the two packages in the way magrittr and dplyr have.
Note to future self...
Depending on @jknowles response I may want to import (add to Depends:) and make a wrapper for his package. Maybe named r_distribution_cor
that works similar to r_sereies
.
@trinker I'm interested in this. I have run into a few snags with datasynthR
that caused me to delay working on it while I moved on to other problems. But, I could probably return to it this summer and get a CRAN worthy version released soon enough. I'd want to check in with you about how to make the packages complementary. wakefield
really solves one of those problems that I was having with datasynthR
that the data generated didn't feel real enough for users who cared about more than the structure (plotting, etc.).
@trinker Any news on this?
I've been revisiting datasynthR
recently with a project for a client (and also exploring how wakefield
works internally in the process). I imagine datasynthR
will need to be refactored soon. I can't guarantee any time to be devoted to that in the coming months -- it depends on whether current projects necessitate it.