maciejkula/spotlight

classification : synthetic unbalanced data generating

Sandy4321 opened this issue · 0 comments

may you share some links to synthetic unbalanced data generating for classification
when your code is for recommendation system data
https://maciejkula.github.io/spotlight/datasets/synthetic.html

meaning close to real data - with mix of categorical and continues features values
in addition to known
simple one
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
weights : array-like of shape (n_classes,) or (n_classes - 1,), (default=None)
The proportions of samples assigned to each class. If None, then classes are balanced. Note that if len(weights) == n_classes - 1, then the last class weight is automatically inferred. More than n_samples samples may be returned if the sum of weights exceeds 1

or maybe your code can be used for binary classification with mix of categorical and continues features values
when different group of features have complicated dependency?