/xplore

A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

Primary LanguagePythonMIT LicenseMIT

xplore


xplore is a python package built with Pandas for data scientist or analysts, AI/ML engineers or researchers for exploring features of a dataset in one line of code for quick analysis before data wrangling and feature extraction. You can also choose to generate a more detailed report on the exploration of your dataset upon request.

Getting started

Install the package

pip install xplore

Import the package into your code

from xplore.data import xplore

Assign the read/open command to the file path or URL of your structured dataset to a variable name

data = < Read in your dataset file here >

Explore your dataset using the xplore() method

xplore(data)

Testing xplore

Navigate to the test.py file after installing the package and run the code in that file to see and understand how xplore works.

Sample Output

------------------------------------
The fist 5 entries of your dataset are:

   rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
0     1      Germany          GER           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
1     2        Italy          ITA           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
2     3  Switzerland          SUI           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
3     4       Sweden          SWE           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
4     5    Argentina          ARG           0.0  ...                 0.0                      0.0       CONMEBOL  1993-08-08

[5 rows x 16 columns]


------------------------------------
The last 5 entries of your dataset are:

       rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
57788   206     Anguilla          AIA           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57789   206      Bahamas          BAH           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57790   206      Eritrea          ERI           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57791   206      Somalia          SOM           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57792   206        Tonga          TGA           0.0  ...                 0.0                      0.0            OFC  2018-06-07

[5 rows x 16 columns]


------------------------------------
Stats on your dataset:

<bound method NDFrame.describe of        rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
0         1      Germany          GER           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
1         2        Italy          ITA           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
2         3  Switzerland          SUI           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
3         4       Sweden          SWE           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
4         5    Argentina          ARG           0.0  ...                 0.0                      0.0       CONMEBOL  1993-08-08
...     ...          ...          ...           ...  ...                 ...                      ...            ...         ...
57788   206     Anguilla          AIA           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57789   206      Bahamas          BAH           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57790   206      Eritrea          ERI           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57791   206      Somalia          SOM           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57792   206        Tonga          TGA           0.0  ...                 0.0                      0.0            OFC  2018-06-07

[57793 rows x 16 columns]>


------------------------------------
The Value types of each column are:

rank                         int64
country_full                object
country_abrv                object
total_points               float64
previous_points              int64
rank_change                  int64
cur_year_avg               float64
cur_year_avg_weighted      float64
last_year_avg              float64
last_year_avg_weighted     float64
two_year_ago_avg           float64
two_year_ago_weighted      float64
three_year_ago_avg         float64
three_year_ago_weighted    float64
confederation               object
rank_date                   object
dtype: object


------------------------------------
Info on your Dataset:

<bound method DataFrame.info of        rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
0         1      Germany          GER           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
1         2        Italy          ITA           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
2         3  Switzerland          SUI           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
3         4       Sweden          SWE           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
4         5    Argentina          ARG           0.0  ...                 0.0                      0.0       CONMEBOL  1993-08-08
...     ...          ...          ...           ...  ...                 ...                      ...            ...         ...
57788   206     Anguilla          AIA           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57789   206      Bahamas          BAH           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57790   206      Eritrea          ERI           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57791   206      Somalia          SOM           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57792   206        Tonga          TGA           0.0  ...                 0.0                      0.0            OFC  2018-06-07

[57793 rows x 16 columns]>


------------------------------------
The shape of your dataset in the order of rows and columns is:

(57793, 16)


------------------------------------
The features of your dataset are:

Index(['rank', 'country_full', 'country_abrv', 'total_points',
       'previous_points', 'rank_change', 'cur_year_avg',
       'cur_year_avg_weighted', 'last_year_avg', 'last_year_avg_weighted',
       'two_year_ago_avg', 'two_year_ago_weighted', 'three_year_ago_avg',
       'three_year_ago_weighted', 'confederation', 'rank_date'],
      dtype='object')


------------------------------------
The total number of null values from individual columns of your data set are:

rank                       0
country_full               0
country_abrv               0
total_points               0
previous_points            0
rank_change                0
cur_year_avg               0
cur_year_avg_weighted      0
last_year_avg              0
last_year_avg_weighted     0
two_year_ago_avg           0
two_year_ago_weighted      0
three_year_ago_avg         0
three_year_ago_weighted    0
confederation              0
rank_date                  0
dtype: int64


------------------------------------
The number of rows in your dataset are:

57793


------------------------------------
The values in your dataset are:

[[1 'Germany' 'GER' ... 0.0 'UEFA' '1993-08-08']
 [2 'Italy' 'ITA' ... 0.0 'UEFA' '1993-08-08']
 [3 'Switzerland' 'SUI' ... 0.0 'UEFA' '1993-08-08']
 ...
 [206 'Eritrea' 'ERI' ... 0.0 'CAF' '2018-06-07']
 [206 'Somalia' 'SOM' ... 0.0 'CAF' '2018-06-07']
 [206 'Tonga' 'TGA' ... 0.0 'OFC' '2018-06-07']]


------------------------------------


Do you want to generate a detailed report on the exploration of your dataset?
[y/n]: y
Generating report...

Summarize dataset: 100%|████████████████████████████████████████████████████████████████████████████| 30/30 [03:34<00:00,  7.14s/it, Completed] 
Generate report structure: 100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:31<00:00, 31.42s/it] 
Render HTML: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:12<00:00, 12.07s/it] 
Export report to file: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.00it/s] 
Your Report has been generated and saved as 'output.html'

Contributing to xplore

Fork and clone this repo if you have any contributions you want to make. Push your commits to a new branch and send a PR when done. I'll review your code and merge your PR as soon as possible.

Maintainers:

Jerry Buaba | Labaran Mohammed | Benjamin Acquaah