Diyago/Tabular-data-generation

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

omerugi opened this issue · 1 comments

We are doing a preprocessing step before using Gan.
First, we take the tabular data the contain strings and booleans and convert it all to floats
Second, we split the data to test and train (to save test sample from original data)
Third, split the test again for the TabGan.
Every time we do that we get :
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
This is coming from sklearn -> validation.py.
In the data we send to the model there are not nan's and no infit.

Is there a way to solve this issue?

I believe they are nan or finite values. Try solutions from here:
https://stackoverflow.com/questions/31323499/sklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for

or just from them:

import pandas as pd

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)