ZaxR/bulwark

Potential new checks to detect df size change: is_smaller, is_larger

ZaxR opened this issue · 2 comments

ZaxR commented

In cases for functions are meant to filter out results, the user might not know the new size of the df, but may know that it should be smaller.

ZaxR commented

This will require comparing the state of the original df to the output df. For example:

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

def sample_func(df):
    return df[df['a'].isin([2])]

In the above case, the original df had three rows, but the output df should have one row. The check would confirm the 3 > 1 for is_smaller

I can try to tackle this. I started looking at the BaseDecorator code, I guess my main question is: when we're decorating the function with this check, can we assume that the first arg passed is the original df? And if not, how do we tell which of the args we are supposed to treat as the original df?