sdv-dev/SDGym

Add ability to load and inspect individual datasets

npatki opened this issue · 1 comments

npatki commented

Problem Description

The SDGym library currently allows you to list the available datasets for benchmarking purposes. However, it does not offer any abilities to inspect these datasets -- users may want to do this in order to see what the columns, data types, or values look like before they apply them to the benchmarking run.

Expected behavior

Add a download_demo method that is similar to the one in the SDV library. This method would return the data and metadata so that SDGym users can inspect the dataset.

Workaround

The SDV library is a prerequisite of SDGym. So as a workaround, you can access the demo datasets through it.

import sdv

from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='single_table',
    dataset_name='adult'
)
npatki commented

For a related discussion, see #253