/adjdatatools

This library contains adjusted tools for data preprocessing and working with mixed data types

Primary LanguagePythonMIT LicenseMIT

AdjDataTools

This library contains adjusted tools for data preprocessing and working with mixed data types.

Installation:

AdjDataTools can be installed directly using pip:

pip install adjdatatools

Dependencies: numpy, pandas

AdjustedScaler scales the data to a valid range to eliminate potential outliers. The range of significant values is calculated using the medcouple function (MC) according to the principle proposed by M. Huberta and E. Vandervierenb in "An adjusted boxplot for skewed distributions" Computational Statistics & Data Analysis, vol. 52, pp. 5186-5201, August 2008.

The structure and usage is similar to the *Scaler classes from sklearn.preprocessing.

The .fit() method is used to train the scaler.

For scaling - the .transform() method.

For the reverse transformation - the .inverse_transform() method.

Parameters

  • with_centering : bool, True by default

If True, center the data before scaling
* columns : list, tuple, False by default
Target features names * paired : list, tuple, False by default
Paired features names * with_sampling : bool, True by default
If True, used sample from a dataset to solve the problem of memory size limitations * max_items : int
Maximum number of elements for solid processing

Using:

from adjdatatools.preprocessing import AdjustedScaler

new_scaler = AdjustedScaler()
new_scaler.fit(my_data_frame)
scaled_data_frame = new_scaler.transform(my_data_frame)