Variable | Description |
---|---|
period_end_date | date of provided record |
translated_when | date of modelling |
if_data_corrected | information whether record was updated |
prod_gr_id | Id of product group |
country_id_n | Id of a country |
delivery_type_id | type of record delivery to the system |
freq_id | type of regularity of delivered records |
retailer_id | Id of record provider |
brand_id | Id of a Brand |
predict_automatch | result of model prediction 0/1 |
class_acctual | actual class of predicted value 0/1 |
11 columns
19697 rows
Variable | NaN count | NaN ratio | Role | Type | Note | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
period_end_date | 57 | 0.00289384 | explanatory | date | 18 days (from 30 Aug till 1 Dec) | ||||||
translated_when | 0 | 0 | explanatory | date + time | 154 days (from 1 Sep till 1 Feb) | ||||||
if_data_corrected | 0 | 0 | explanatory | categorical |
|
||||||
prod_gr_id | 0 | 0 | explanatory | categorical |
|
||||||
country_id_n | 1292 | 0.0655937 | explanatory | categorical | 35 entries | ||||||
delivery_type_id | 1335 | 0.0677768 | explanatory | categorical | 915 entries | ||||||
freq_id | 0 | 0 | explanatory | categorical |
|
||||||
retailer_id | 0 | 0 | explanatory | categorical | 52 entries | ||||||
brand_id | 0 | 0 | explanatory | categorical | 199 entries | ||||||
predict_automatch | 329 | 0.0167031 | output | ||||||||
class_acctual | 0 | 0 | output |
- Experimental Design and Analysis Howard J. Seltman
- short EDA intro
- Jupyter + GIT
- Jupyter + Docker
- set visualization (Venn)
- Pandas + Timeseries
- Statistics + Python
- Probability + Statisitcs. A FIRST COURSE IN PROBABILITY
-
rename class_acctual to class_actual
-
amputate the data for the column translated_when after the 1st Dec
df[df['translated_when'].dt.date > datetime('2020-12-01')]
-
Countries with ids 106 and 109 are really poorly correlated
-
cross tabulation showed
-
split between if_data_corrected vs period_end_date except for 1st Nov 2020
-
split between country_id_n vs prod_gr_id except for the countries: 106, 108, 113, 116, 176
-
split between retailer_id vs freq_id
-