sdv-dev/SDGym

add categorical data generators

Sandy4321 opened this issue · 4 comments

Problem Description

I can not find categorical data generators on
https://docs.sdv.dev/sdgym/customization/synthesizers/sdv-synthesizers

Expected behavior

generate categorical columns in data table with predefined statistical dependences between values in columns
for example
1 label/target is "yes" with probability 0.8 -> when value in column A is "abc" and values in column B is "nmjki" and values in column C is "tgtgt"
2 label/target is "yes" with probability 0.9 -> when value in column A is "abc" and values in column B is "l;kji" and values in column D is "ujhbn"
otherwise label/target is "no"

npatki commented

Hi @Sandy4321,

The synthesizers that you see listed on the page are the ones coming from the SDV library. If you'd like to test a particular algorithm, we encourage you to write up the logic in a custom synthesizer. You can then benchmark it in the same way as any other SDV synthesizer.

Please note that the SDGym library expects synthesizers to use some kind of machine learning. That is, the ability to learn patterns from the real data and replicate them in the synthetic data. The SDV synthesizers should be able to automatically capture general trends (eg. label "yes" occurs with probability 0.8, and value "abc" tends to correspond with another value "nmjki"). For more information, check out this blog post

npatki commented

Hi @Sandy4321, do you still have any outstanding questions about this? It's been a few weeks since the last response, so I'm marking this question as answered.

If there's more to discuss, feel free to reply and we can always reopen the issue.

can you do it by yourself pls

npatki commented

Hi @Sandy4321 I'm not sure what you are asking. Would you like the DataCebo team to write this synthesizer for you?

Unfortunately, we are a small team and do not have the resources to write custom code for all our useres. If your project requires more involvement from the SDV team, you can Contact Us about pursuing a business relationship.