/dstools

Handy tools for data scientists

Primary LanguagePython

Handy tools for data scientists

image

Available functions

metrics.calculate_relationship Determine if y is positive|negative|unrelated with x

metrics.cosine_similarity Cosine similartiy between two vector

metrics.jarccard_index Calculate the jarccard index (aka jaccard similarity).

metrics.ks_score Calculating the Kolmogorov-Smirnov score

metrics.lift_table Create lift table given cutoff point or number of bins

metrics.psi Calculate PSI given two array.

sklearn_extension.BaseEstimator Base class for all estimators in scikit-learn

sklearn_extension.Binning Base class for all Binning functionalities,

sklearn_extension.BorderlineSMOTE Over-sampling using Borderline SMOTE.

sklearn_extension.ChiSquareBinning No documentation found.

sklearn_extension.ClassifierMixin Mixin class for all classifiers in scikit-learn.

sklearn_extension.ConditionalWrapper A conditional wrapper that makes a Scikit-Learn transformer only works on part of the data

sklearn_extension.CorrelationRemover No documentation found.

sklearn_extension.EntropyBinning No documentation found.

sklearn_extension.EqualFrequencyBinning No documentation found.

sklearn_extension.EqualWidthBinning No documentation found.

sklearn_extension.IQROutlierRemover Removing outlier based on IQR,

sklearn_extension.IVBinning No documentation found.

sklearn_extension.IncrementalLogisticRegression Incremental Logistic Regression

sklearn_extension.Inspect A step that can be plugged into the pipeline to inspect the

sklearn_extension.KMeansSMOTE Apply a KMeans clustering before to over-sample using SMOTE.

sklearn_extension.KSBinning No documentation found.

sklearn_extension.NormDistOutlierRemover Removing outliers assuming data is independent and followes normal distribution

sklearn_extension.NotFittedError Exception class to raise if estimator is used before fitting.

sklearn_extension.OrdinalEncoder Similar Scikit-Learn OrdinalEncoder but allows for arbitrary ordering in the columns,

sklearn_extension.Pipeline A dropin replacement for Scikit-learn Pipeline object that supports

sklearn_extension.QuantileOutlierRemover Removing outlier based on skewness threshold

sklearn_extension.RandomOverSampler Class to perform random over-sampling.

sklearn_extension.SMOTE Class to perform over-sampling using SMOTE.

sklearn_extension.SVMSMOTE Over-sampling using SVM-SMOTE.

sklearn_extension.SparsityRemover No documentation found.

sklearn_extension.StepwiseLogisticRegression Stepwise Logistic Regression

sklearn_extension.TreeBinner No documentation found.

sklearn_extension.WoeEncoder No documentation found.

sklearn_extension.equal_frequency_binning Shortcut for equal frequency binning on a Pandas.Series, returns

sklearn_extension.equal_width_binning Shortcut for equal width binning on a Pandas.Series, returns

sklearn_extension.iv Compute the iv stats for each feature, return a list of woe value.

sklearn_extension.return_frame A class decorator for Scikit-Learn transformers

sklearn_extension.sort_columns_logistic Sort columns according to wald_chi2

sklearn_extension.sort_columns_tree Sort columns according to feature importance in tree method

sklearn_extension.woe Return a series mapping feature value to its woe stats

utils.capture_output Capture stdout and stderr as string.

utils.check_same_length A decorator that checks all the arguments to be the same length

utils.create_multilevel_index Create two-level multilevel index from given index names.

utils.find_duplicates Find duplicate elements in an iterable

utils.flatten_list Flatten a nested list regardless of the depth.

utils.get_stats Return a pstats.Stats object from a statement.

utils.groupby groupby(iterable, key=None) -> make an iterator that returns consecutive

utils.is_scalar_nan Tests if x is NaN

utils.iter_date Iterate over days

utils.limit_precision Limit the precision of a float number

utils.maybe_mkdir Create directory when it didn't exist.

utils.ngram Generating n-gram from iterable.

utils.plot_distribution Show the plot for the specified distribution

utils.print_source_code Print the source code of an object.

utils.print_stats Print out the profiling detail from the statement sorted by *keys

utils.read_csv Read multiple csv file and concatenate them row-wise

utils.read_excel Read multiple excel file and concatenate them row-wise

utils.read_multiple_files No documentation found.

utils.read_sheets Read all the sheets in an excel file and concatenate them row-wise

utils.return_default A decorator that checks the first argument, if meets the criteria then simply return the default_value

utils.set_default A decorator that checks the first argument, if meets the criteria then replace it with default_value

utils.timeit A decorator that times the function and logs the information.

utils.today Return the date of today as a string.

utils.weighted_sum No documentation found.

utils.write_dict_to_excel Save a dictionary to an Excel file with each key being the sheet name