Constraints for generating synthetic data
Closed this issue · 3 comments
i-akiya commented
I'd like to make synthtic data with constraints like following, however, I don't know to implement it.
constraint conditions
- Column A value is greter than Column B value on the record level.
- And, both Column A value and Column B value is positive number.
I know that Synthpop::syn function provides can accept constraints via rules and rvalues argument, but I don't think to be able to configured the above constraints.
Any one, please let me to any suggestions.
LotteVanUtrecht commented
Hi,
I assume you are making a synthetic data set based on a real data set? If
so, when using the default synthesizing method (cart), the synthetic data
will only contain values that are in the real data. So if Column A and B
are both strictly positive in the real data, they will also be strictly
positive in the synthetic data.
The other constraint (Column A>Column B) is a little more complicated.
You're right that the rules and rvalues arguments aren't fit for purpose
here.
I would advise to try and run the synthhpop::syn() first. Sometimes the
default settings will find constraints like these in the real data and copy
them in the synthetic data.
If that doesn't work, there is another approach. You can redefine your
dataset a little by adding a new Column C between Column A and Column B.
Column C is equal to Column A-Column B. This column should be strictly
positive in the real data, and so it will also be strictly positive in the
synthetic data. You can then reconstruct Column B by calculating Column
A-Column C, either by using the rules and rvalues arguments or manually
outside the synthesis. It's a little hacky, but ought it to work.
Hope this helps.
Best,
Lotte
…On Mon, May 1, 2023, 15:34 Ippei Akiya ***@***.***> wrote:
I'd like to make synthtic data with constraints like following, however, I
don't know to implement it.
constraint conditions
Column A value is greter than Column B value.
And, both Column A value and Column B value is positive number.
I know that Synthpop::syn function provides can accept constraints via
rules and rvalues argument, but I don't think to be able to configured the
above constraints.
Any one, please let me to any suggestions.
—
Reply to this email directly, view it on GitHub
<#28>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKYULHDAKLZ2BYKSWMGDEY3XD63YDANCNFSM6AAAAAAXRY3X2M>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
gillian-raab commented
You could try this.
1 synthesise column b first constrained to be >0
then either synthesise (a-b) constrained to be >0
or a constrained to be >b
Hope this helps
Gillian M Raab
Research Fellow (part-time)
Scottish Centre for Administrative Data Research
My core working days are Tuesdays and Thursdays
Though I sometimes swap them for other days
07748 678 551
…________________________________
From: Ippei Akiya ***@***.***>
Sent: 01 May 2023 14:34
To: bnowok/synthpop ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [bnowok/synthpop] Constraints for generating synthetic data (Issue #28)
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
I'd like to make synthtic data with constraints like following, however, I don't know to implement it.
constraint conditions
Column A value is greter than Column B value.
And, both Column A value and Column B value is positive number.
I know that Synthpop::syn function provides can accept constraints via rules and rvalues argument, but I don't think to be able to configured the above constraints.
Any one, please let me to any suggestions.
—
Reply to this email directly, view it on GitHub<#28>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE3HB7G7NCZEC7BSFQUUHK3XD63YDANCNFSM6AAAAAAXRY3X2M>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
i-akiya commented
Thanks Lotte and Gillian.
I tried Lotte's hacky approch that make intermediate Column C.
As a result, this way is seed perfect in CART and Ranger method that absolutely keeps Column A > Column B in 76400 records.
Thank you again for your suggestions.
Ippei