sdv-dev/SDV

Improve usage of `detect_from_dataframes` function

npatki opened this issue · 0 comments

Problem Description

The metadata's detect_from_dataframes is currently meant to be used on a pre-existing instance of a metadata object. The function internally updates the metadata object and doesn't return anything.

The usage leads to some redundant code and confusion because you have to instantiate the object separately (why?), and it's unexpected that detection would not return anything. Additionally, we only expect this function to be used once, so it's not clear why it's possible to apply it to the same object multiple times.

Expected behavior

To improve the usage of this function, we will make it a class function that ultimately returns an instance of the metadata object.

from sdv.metadata import Metadata

metadata = Metadata.detect_from_dataframes(data=my_dataframes)

Additional context

All detection functions will be updated to follow this paradigm. See:

  • detect_from_dataframe (singular) improvements, specified in #2211
  • detect_from_ddl improvements specified in #2212