Library with goal similar to Pandas Dataframe Describe with some additional information.
Current functions:
- details: This variable return a dataframe with all statistic data. Table on chapter 2, show us a better description of all.
- columns_type: Return a dictionary that show you a column types distribution.
- show: This method will filter details variable showing you only columns by type.
The unique parameter type can be one of follows 'all','int64','float64','object','bool', 'datetime64'. - obj_distrib: This method will show the data distribution for column of type 'object'. The parameter 3 possible parameter will better explained on chapter 3.
import pandas as pd
import dfview as ovw
df = pd.read_csv("my_data.csv")
describe = ovw.DataOverview(df)
# getting dictionary of type columns distribuction
cols = describe.columns_type()
print(cols)
# showing describe by type of columns
describe.show(type="int64")
describe.show(type="object")
# showing entire describe
describe.details
describe.show(type='all')
# getting obj distribution all columns, axis=1, include_nulls=True
describe.obj_distrib()
# getting obj distribution with some specific columns, axis=0, include_nulls=False
describe.obj_distrib( columns_list=['col1', 'col2', 'col3'], axis=0, include_nulls=False )
Column | Description |
---|---|
dtype |
Type of column data. |
count |
Count of non null occurrences. |
null |
Count of null occurrences. |
min |
Smallest value of columns's data. This applies only for int or float column type. |
mean |
Mean of data. This applies only for int or float column type. |
max |
Largest value of column's data. This applies only for int or float column type. |
std |
Standard deviation of data. This applies only for int or float column type. |
std% |
Percentage that represents the size of standard deviation in comparison of data distribution. This applies only for int or float column type. |
25% |
Quantile 25% like a pandas describe method. |
50% |
Quantile 50% like a pandas describe method. |
75% |
Quantile 75% like a pandas describe method. |
mode |
Mode of the column or occurrence data with the most repetitions. |
n_mode |
Occurrence count of the data mode. |
Column | Description |
---|---|
column_list |
List of columns that you want show data distribuction. Default value show all tables. |
include_nulls |
This boolean parameter include or exclude null values on first occurence of dataframe returned. Default value is True. |
axis |
This integer parameter change dataframe disposition. The default value 1 returns columns on top, 0 on left side. |