Data

Main info

Variable	Description
period_end_date	date of provided record
translated_when	date of modelling
if_data_corrected	information whether record was updated
prod_gr_id	Id of product group
country_id_n	Id of a country
delivery_type_id	type of record delivery to the system
freq_id	type of regularity of delivered records
retailer_id	Id of record provider
brand_id	Id of a Brand
predict_automatch	result of model prediction 0/1
class_acctual	actual class of predicted value 0/1

Size

11 columns

19697 rows

Description

Variable

NaN count

NaN ratio

Role

Type

Note

period_end_date

0.00289384

explanatory

date

18 days (from 30 Aug till 1 Dec)

translated_when

explanatory

date + time

154 days (from 1 Sep till 1 Feb)

if_data_corrected

explanatory

categorical

0	17085
1	2612

prod_gr_id

explanatory

categorical

413	4486
426	11844
427	3367

country_id_n

1292

0.0655937

explanatory

categorical

35 entries

delivery_type_id

1335

0.0677768

explanatory

categorical

915 entries

freq_id

explanatory

categorical

1	7763
2	11934

retailer_id

explanatory

categorical

52 entries

brand_id

explanatory

categorical

199 entries

predict_automatch

329

0.0167031

output

class_acctual

output

Material

Suggestions

rename class_acctual to class_actual
amputate the data for the column translated_when after the 1st Dec

df[df['translated_when'].dt.date > datetime('2020-12-01')]

Countries with ids 106 and 109 are really poorly correlated
cross tabulation showed
- split between if_data_corrected vs period_end_date except for 1st Nov 2020
- split between country_id_n vs prod_gr_id except for the countries: 106, 108, 113, 116, 176
- split between retailer_id vs freq_id

sh-am-si/gfk

Data

Main info

Size

Description

Material

Suggestions