Python utility functions for data science.
- Base
- R-like table function
- Explore
- Missing value percentage of all columns in df
- Get all unique keys in df
- Count
*
group by columns - Customized
describe()
for numerical columns - Describe a numerical column given the value of a categorical column
- Describe categorical columns including distribution of values
- Describe a categorical column given the value of a categorical column
- Stats Vector-based and dataframe-based stat functions
- Correlation matrix
- Entropy and mutual information
- t-test
- Chi-square test
- ANOVA
- Math
- Log and inverse-log of different bases
- Median absolute deviation (MAD)
- Preprocessing
- Cast type for mulitple columns in bulk
- Fill-in missing values in bulk
- One-hot encoding and label encoding for training and test
- Evaluation
- Sklearn scorer for log-scale target variable
- Visualization Dataframe oriented plot functions
- histogram
- scatterplot
- barplot of distribution of categorical columns
- boxplot of a numerical column given values of a categorical columns
- stackedbarplot of a categorical column given values of another categorical column
- pairplot of all columns
- plot all rows of dataframe
- plot explained variance ratio in PCA
Back to Home Page