Keyvan Tajbakhsh
August 12th, 2019
This project relies on identifying key differentiators that divide customers into groups that can be targeted. Information such as a customers demographics (age, race, religion, gender, family size, ethnicity, income, education level), geography (where they live and work), psychographic (social class, lifestyle and personality characteristics) and behavioral (spending, consumption, usage and desired benefits) tendencies are taken into account when determining customer segmentation practices.
We will use unsupervised learning techniques to describe the relationship between the demographics of the company's existing customers and the general geographical population of Germany. The datasets provided need to be treated and prepared before implementing machine learning algorithms.
Our cluster analysis will be used to implement our supervised learning algorithm. In this context we will train and implement a supervised algorithm able to predict if a customer will respond positively to the mail-order campaign or not (binary classification problem). Then we will create a benchmark model to compare our final result and test the data.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
- NumPy - A fundamental package for scientific computing with Python.
- Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
- ScikitLearn - Simple and efficient tools for data mining and data analysis
- Matplotlib - Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms
- Pickle - The pickle module implements binary protocols for serializing and de-serializing a Python object structure.
- Sea Born - Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
- boto3 - Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services.
- SageMaker - SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.
You will also need to have software installed to run and execute a Jupyter Notebook
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.
The project is divided into two parts. The code is provided in the 1_Customer_Segmentation_Report.ipynb
and 2_Supervised_Learning_Model.ipynb
notebook file. You will also be required to use aws SageMaker platform in the section Linear Learner
to execute the code. This section is executed on Amazon SageMaker platform notebook. LinearLearner is a buitlin algorithm and we are only able to train and deploy this algorithm on Amazon SageMaker.
In a terminal or command window, navigate to the top-level project directory Capstone_Project/
(that contains this README) and run one of the following commands:
ipython notebook 1_Customer_Segmentation_Report.ipynb
or
jupyter notebook 1_Customer_Segmentation_Report.ipynb
This will open the Jupyter Notebook software and project file in your browser.
In this project datasets are provided by Udacity and limited to this project.However it is composed of two parts and four datasets described as follows:
1 - Customer Segmentation Report (unsupervised learning):
• Udacity_AZDIAS_052018.csv: Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns) • Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns)
The general population dataset (AZDIAS) will be used to create our unsupervised model (PCA and K-means). Then customers dataset will be mapped into the model in order to identify patterns and relation between customers groups.
2 - Supervised Learning Model:
• Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns) • Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns).
Before implementing our supervised model, the same steps are applied (part one) to the train dataset. After that we train our binary classifier to create a benchmark, then our target model. Different metrics will be used to evaluate our model.
Features
- KBA05_DIESEL : share of cars with Diesel-engine in the microcell
- KBA13_BJ_2009 : share of cars built in 2009 within the PLZ8
- KBA05_ANTG4 : number of >10 family houses in the cell
- KBA13_VW : share of VOLKSWAGEN within the PLZ8
- KBA05_HERST4 : share of European manufacturer (e.g. Fiat, Peugeot, Rover,...)
- KBA13_KW_110 : share of cars with an engine power between 91 and 110 KW - PLZ8
- KBA13_SITZE_5 : number of cars with 5 seats in the PLZ8
- KBA13_KW_30 : share of cars up to 30 KW engine power - PLZ8
- D19_GESAMT_OFFLINE_DATUM : actuality of the last transaction with the complete file OFFLINE
- KBA13_KMH_140_210 : share of cars with max speed between 140 and 210 km/h within the PLZ8
- KBA13_KRSHERST_FORD_OPEL : share of FORD/Opel (referred to the county average) - PLZ8
- KBA05_SEG2 : share of small and very small cars (Ford Fiesta, Ford Ka etc.) in the microcell
- KBA05_ZUL2 : share of cars built between 1994 and 2000
- ZABEOTYP : typification of energy consumers
- ORTSGR_KLS9 : size of the community
- PLZ8_ANTG1 : number of 1-2 family houses in the PLZ8
- D19_VERSAND_ONLINE_QUOTE_12 : amount of online transactions within all transactions in the segment mail-order
- KBA13_ANZAHL_PKW : number of cars in the PLZ8
- KBA13_CCM_1500 : share of cars with 1400ccm to 1499ccm within the PLZ8
- D19_BANKEN_ANZ_12 : transaction activity BANKS in the last 12 months
- KBA05_MOTRAD : share of motorcycles per household
- KBA13_KW_80 : share of cars with an engine power between 71 and 80 KW - PLZ8
- KBA13_MOTOR : most common motor size within the PLZ8
- GREEN_AVANTGARDE : Green avantgarde
- KBA05_KRSKLEIN : share of small cars (referred to the county average)
- KBA05_MAXBJ : most common age of the cars in the microcell
- KBA13_KW_121 : share of cars with an engine power more than 120 KW - PLZ8
- SEMIO_TRADV : affinity indicating in what way the person is traditional minded
- KBA05_ANTG2 : number of 3-5 family houses in the cell
- SEMIO_DOM : affinity indicating in what way the person is dominant minded
- KBA13_HALTER_50 : share of car owners between 46 and 50 within the PLZ8
- D19_VERSAND_ONLINE_DATUM : actuality of the last transaction for the segment mail-order ONLINE
- FINANZ_SPARER : financial typology: money saver
- KBA13_KMH_180 : share of cars with max speed between 110 km/h and 180km/h within the PLZ8
- PLZ8_ANTG4 : number of >10 family houses in the PLZ8
- KBA13_SEG_OBEREMITTELKLASSE : share of upper middle class cars and upper class cars (BMW5er, BMW7er etc.)
- KBA05_SEG4 : share of middle class cars (Ford Mondeo etc.) in the microcell
- KBA13_HALTER_65 : share of car owners between 61 and 65 within the PLZ8
- ANZ_HAUSHALTE_AKTIV : number of households in the building
- KBA13_KW_90 : share of cars with an engine power between 81 and 90 KW - PLZ8
- SHOPPER_TYP : shopping typology
- KBA05_KRSOBER : share of upper class cars (referred to the county average)
- FINANZ_MINIMALIST : financial typology: low financial interest
- GEBAEUDETYP : type of building (residential or commercial)
- EWDICHTE : density of inhabitants per square kilometer
- KBA13_KRSSEG_VAN : share of vans (referred to the county average) - PLZ8
- KBA13_VORB_2 : share of cars with 2 preowner - PLZ8
- LP_STATUS_GROB : social status rough
- FINANZ_VORSORGER : financial typology: be prepared
- PRAEGENDE_JUGENDJAHRE : dominating movement in the person's youth (avantgarde or mainstream)
- KBA05_ALTER4 : share of cars owners elder than 61 years
- KBA13_SITZE_4 : number of cars with less than 5 seats in the PLZ8
- OST_WEST_KZ : flag indicating the former GDR/FRG
- KBA05_AUTOQUOT : share of cars per household
- KBA13_BJ_2004 : share of cars built before 2004 within the PLZ8
- KBA13_KW_120 : share of cars with an engine power between 111 and 120 KW - PLZ8
- KBA05_GBZ : number of buildings in the microcell
- D19_TELKO_ONLINE_DATUM : actuality of the last transaction for the segment telecommunication ONLINE
- KBA05_KRSAQUOT : share of cars per household (reffered to county average)
- D19_BANKEN_DATUM : actuality of the last transaction for the segment banks TOTAL
- KBA05_HERST5 : share of asian manufacturer (e.g. Toyota, Kia,...)
- KBA05_MOD3 : share of Golf-class cars (in an AZ specific definition)
- KBA05_SEG5 : share of upper middle class cars and upper class cars (BMW5er, BMW7er etc.)
- KONSUMNAEHE : distance from a building to PoS (Point of Sale)
- CAMEO_DEU_2015 : CAMEO classification 2015 - detailled classification
- SEMIO_KRIT : affinity indicating in what way the person is critical minded
- AGER_TYP : best-ager typology
- KBA13_FIAT : share of FIAT within the PLZ8
- HEALTH_TYP : health typology
- ALTERSKATEGORIE_GROB : age classification through prename analysis
- KBA13_HALTER_20 : share of car owners below 21 within the PLZ8
- SEMIO_KULT : affinity indicating in what way the person is cultural minded
- KBA13_NISSAN : share of NISSAN within the PLZ8
- D19_BANKEN_OFFLINE_DATUM : actuality of the last transaction for the segment banks OFFLINE
- KBA13_HALTER_60 : share of car owners between 56 and 60 within the PLZ8
- FINANZ_UNAUFFAELLIGER : financial typology: unremarkable
- KBA05_KRSHERST1 : share of Mercedes/BMW (reffered to the county average)
- KBA05_MOD2 : share of middle class cars (in an AZ specific definition)
- D19_VERSAND_ANZ_24 : transaction activity MAIL-ORDER in the last 24 months
- KBA13_KW_0_60 : share of cars up to 60 KW engine power - PLZ8
- KBA05_VORB0 : share of cars with no preowner
- KBA13_BJ_2008 : share of cars built in 2008 within the PLZ8
- KBA13_CCM_1200 : share of cars with 1000ccm to 1199ccm within the PLZ8
- KBA13_KRSHERST_BMW_BENZ : share of BMW/Mercedes Benz (referred to the county average) - PLZ8
- D19_GESAMT_ANZ_24 : transaction activity TOTAL POOL in the last 24 months
- KBA05_SEG8 : share of roadster and convertables in the microcell
- D19_VERSAND_OFFLINE_DATUM : actuality of the last transaction for the segment mail-order OFFLINE
- SEMIO_KAEM : affinity indicating in what way the person is of a fightfull attitude
- W_KEIT_KIND_HH : likelihood of a child present in this household
- KBA13_MAZDA : share of MAZDA within the PLZ8
- KBA05_ANTG3 : number of 6-10 family houses in the cell
- KBA05_MOTOR : most common engine size in the microcell
- ANZ_PERSONEN : number of adult persons in the household
- KBA13_OPEL : share of OPEL within the PLZ8
- KBA13_KMH_251 : share of cars with a greater max speed than 250 km/h within the PLZ8
- KBA13_CCM_2501 : share of cars with more than 2500ccm within the PLZ8
- KBA13_VORB_1 : share of cars with 1 preowner - PLZ8
- KBA13_MERCEDES : share of MERCEDES within the PLZ8
- KBA13_VORB_3 : share of cars with 3 or more preowner - PLZ8
- ONLINE_AFFINITAET : online affinity
- PLZ8_ANTG3 : number of 6-10 family houses in the PLZ8
- D19_TELKO_ANZ_12 : transaction activity TELCO in the last 12 months
- KBA05_SEG3 : share of lowe midclass cars (Ford Focus etc.) in the microcell
- KBA05_ZUL1 : share of cars built before 1994
- KBA13_SEG_UTILITIES : share of MUVs/SUVs within the PLZ8
- KBA05_HERSTTEMP : development of the most common car manufacturers in the neighbourhood
- KBA05_MAXVORB : most common preowner structure in the microcell
- KBA05_ANTG1 : number of 1-2 family houses in the cell
- KBA05_MAXAH : most common age of car owners in the microcell
- KBA13_KMH_250 : share of cars with max speed between 210 and 250 km/h within the PLZ8
- KBA13_SEG_MITTELKLASSE : share of middle class cars (Ford Mondeo etc.) in the PLZ8
- KBA13_SEG_MINIVANS : share of minivans within the PLZ8
- RELAT_AB : share of unemployed in relation to the county the community belongs to
- ANREDE_KZ : gender
- GFK_URLAUBERTYP : vacation habits
- KBA05_MOD1 : share of upper class cars (in an AZ specific definition)
- KBA13_CCM_3001 : share of cars with more than 3000ccm within the PLZ8
- KBA05_KRSVAN : share of vans (referred to the county average)
- KBA13_CCM_3000 : share of cars with 2500ccm to 2999ccm within the PLZ8
- KBA13_PEUGEOT : share of PEUGEOT within the PLZ8
- KBA13_TOYOTA : share of TOYOTA within the PLZ8
- KBA13_HALTER_35 : share of car owners between 31 and 35 within the PLZ8
- KBA13_BJ_1999 : share of cars built between 1995 and 1999 within the PLZ8
- KBA13_CCM_2500 : share of cars with 2000ccm to 2499ccm within the PLZ8
- KBA05_MAXHERST : most common car manufacturer in the microcell
- KBA13_RENAULT : share of RENAULT within the PLZ8
- KBA13_HALTER_40 : share of car owners between 36 and 40 within the PLZ8
- D19_VERSI_ANZ_24 : transaction activity INSURANCE in the last 24 months
- D19_VERSAND_ANZ_12 : transaction activity MAIL-ORDER in the last 12 months
- KBA13_HALTER_45 : share of car owners between 41 and 45 within the PLZ8
- KBA13_SEG_KLEINWAGEN : share of small and very small cars (Ford Fiesta, Ford Ka etc.) in the PLZ8
- D19_BANKEN_ANZ_24 : transaction activity BANKS in the last 24 months
- KBA05_SEG10 : share of more specific cars (Vans, convertables, all-terrains, MUVs etc.)
- KBA13_HERST_FORD_OPEL : share of Ford & Opel/Vauxhall within the PLZ8
- KKK : purchasing power
- KBA05_KW1 : share of cars with less than 59 KW engine power
- KBA05_MAXSEG : most common car segment in the microcell
- SEMIO_VERT : affinity indicating in what way the person is dreamily
- KBA05_MOD4 : share of small cars (in an AZ specific definition)
- D19_VERSAND_DATUM : actuality of the last transaction for the segment mail-order TOTAL
- BALLRAUM : distance to next urban centre
- KBA13_BMW : share of BMW within the PLZ8
- KBA13_SEG_GELAENDEWAGEN : share of allterrain within the PLZ8
- LP_LEBENSPHASE_GROB : lifestage rough
- KBA13_BJ_2006 : share of cars built between 2005 and 2006 within the PLZ8
- KBA05_ZUL4 : share of cars built from 2003 on
- SEMIO_PFLICHT : affinity indicating in what way the person is dutyfull traditional minded
- KBA05_SEG7 : share of all-terrain vehicles and MUVs in the microcell
- MIN_GEBAEUDEJAHR : year the building was first mentioned in our database
- KBA05_ALTER1 : share of car owners less than 31 years old
- LP_LEBENSPHASE_FEIN : lifestage fine
- KBA05_BAUMAX : most common building-type within the cell
- D19_VERSI_ANZ_12 : transaction activity INSURANCE in the last 12 months
- KBA13_KW_60 : share of cars with an engine power between 51 and 60 KW - PLZ8
- ANZ_HH_TITEL : number of academic title holder in building
- KBA13_SEG_GROSSRAUMVANS : share of big sized vans within the PLZ8
- KBA05_CCM3 : share of cars with 1800ccm to 2499 ccm
- KBA13_ALTERHALTER_45 : share of car owners between 31 and 45 within the PLZ8
- KBA13_HALTER_66 : share of car owners over 66 within the PLZ8
- MOBI_REGIO : moving patterns
- CAMEO_DEUG_2015 : CAMEO classification 2015 - Uppergroup
- KBA13_ALTERHALTER_61 : share of car owners elder than 61 within the PLZ8
- ANZ_TITEL : number of professional title holder in household
- SEMIO_REL : affinity indicating in what way the person is religious
- KBA13_CCM_1800 : share of cars with 1600ccm to 1799ccm within the PLZ8
- KBA13_HALTER_25 : share of car owners between 21 and 25 within the PLZ8
- KBA13_HERST_EUROPA : share of European cars within the PLZ8
- D19_TELKO_OFFLINE_DATUM : actuality of the last transaction for the segment telecommunication OFFLINE
- KBA13_CCM_0_1400 : share of cars with less than 1400ccm within the PLZ8
- D19_GESAMT_ANZ_12 : transaction activity TOTAL POOL in the last 12 months
- KBA13_AUDI : share of AUDI within the PLZ8
- KBA13_KRSZUL_NEU : share of newbuilt cars (referred to the county average) - PLZ8
- GEBAEUDETYP_RASTER : industrial areas
- FINANZ_ANLEGER : financial typology: investor
- KBA13_ALTERHALTER_60 : share of car owners between 46 and 60 within the PLZ8
- KBA13_FAB_ASIEN : share of other Asian Manufacturers within the PLZ8
- FINANZTYP : best descirbing financial type for the person
- KBA05_HERST3 : share of Ford/Opel
- KBA13_CCM_1600 : share of cars with 1500ccm to 1599ccm within the PLZ8
- FINANZ_HAUSBAUER : financial typology: main focus is the own house
- KBA13_CCM_1400 : share of cars with 1200ccm to 1399ccm within the PLZ8
- KBA13_KW_61_120 : share of cars with an engine power between 61 and 120 KW - PLZ8
- KBA13_SEG_VAN : share of vans within the PLZ8
- D19_GESAMT_ONLINE_DATUM : actuality of the last transaction with the complete file ONLINE
- D19_TELKO_DATUM : actuality of the last transaction for the segment telecommunication TOTAL
- KBA13_KW_70 : share of cars with an engine power between 61 and 70 KW - PLZ8
- SEMIO_MAT : affinity indicating in what way the person is material minded
- KBA05_MOD8 : share of vans (in an AZ specific definition)
- KBA05_CCM1 : share of cars with less than 1399ccm
- D19_BANKEN_ONLINE_DATUM : actuality of the last transaction for the segment banks ONLINE
- PLZ8_GBZ : number of buildings within the PLZ8
- KBA05_KRSHERST3 : share of Ford/Opel (reffered to the county average)
- KBA05_VORB1 : share of cars with one or two preowner
- KBA05_VORB2 : share of cars with more than two preowner
- KBA05_KRSHERST2 : share of Volkswagen (reffered to the county average)
- KBA05_KRSZUL : share of newbuilt cars (referred to the county average)
- KBA13_CCM_1000 : share of cars with less than 1000ccm within the PLZ8
- KBA13_HERST_ASIEN : share of Asian Manufacturers within the PLZ8
- KBA13_HERST_BMW_BENZ : share of BMW & Mercedes Benz within the PLZ8
- KBA13_KMH_140 : share of cars with max speed between 110 km/h and 140km/h within the PLZ8
- KBA13_AUTOQUOTE : share of cars per household within the PLZ8
- KBA13_FAB_SONSTIGE : share of other Manufacturers within the PLZ8
- KBA13_KRSHERST_AUDI_VW : share of Volkswagen (referred to the county average) - PLZ8
- KBA13_VORB_0 : share of cars with no preowner - PLZ8
- KBA13_KW_40 : share of cars with an engine power between 31 and 40 KW - PLZ8
- KBA05_MODTEMP : development of the most common car segment in the neighbourhood
- KBA05_SEG6 : share of upper class cars (BMW 7er etc.) in the microcell
- KBA13_SEG_SPORTWAGEN : share of sportscars within the PLZ8
- CJT_GESAMTTYP : customer journey typology
- KBA13_KMH_0_140 : share of cars with max speed 140 km/h within the PLZ8
- D19_BANKEN_ONLINE_QUOTE_12 : amount of online transactions within all transactions in the segment bank
- KBA05_CCM4 : share of cars with more than 2499ccm
- KBA05_SEG9 : share of vans in the microcell
- VERS_TYP : insurance typology
- KBA05_KW2 : share of cars with an engine power between 60 and 119 KW
- TITEL_KZ : flag whether this person holds an academic title
- KBA05_HERST1 : share of top German manufacturer (Mercedes, BMW)
- KBA13_SEG_KLEINST : share of very small cars (Ford Ka etc.) in the PLZ8
- KBA13_SEG_SONSTIGE : share of other cars within the PLZ8
- KBA13_CCM_2000 : share of cars with 1800ccm to 1999ccm within the PLZ8
- D19_GESAMT_ONLINE_QUOTE_12 : amount of online transactions within all transactions in the complete file
- KBA05_CCM2 : share of cars with 1400ccm to 1799 ccm
- KBA05_ZUL3 : share of cars built between 2001 and 2002
- LP_FAMILIE_FEIN : familytyp fine
- KBA13_SITZE_6 : number of cars with more than 5 seats in the PLZ8
- SEMIO_RAT : affinity indicating in what way the person is of a rational mind
- KBA05_HERST2 : share of Volkswagen-Cars (including Audi)
- KBA13_HALTER_30 : share of car owners between 26 and 30 within the PLZ8
- D19_GESAMT_DATUM : actuality of the last transaction with the complete file TOTAL
- INNENSTADT : distance to the city centre
- KBA13_KW_50 : share of cars with an engine power between 41 and 50 KW - PLZ8
- KBA13_ALTERHALTER_30 : share of car owners below 31 within the PLZ8
- KBA13_SEG_OBERKLASSE : share of upper class cars (BMW 7er etc.) in the PLZ8
- LP_FAMILIE_GROB : familytyp rough
- NATIONALITAET_KZ : nationaltity (scored by prename analysis)
- SEMIO_ERL : affinity indicating in what way the person is eventful orientated
- ALTER_HH : main age within the household
- KBA05_ALTER3 : share of car owners inbetween 45 and 60 years of age
- KBA05_KW3 : share of cars with an engine power of more than 119 KW
- WOHNLAGE : residential-area
- HH_EINKOMMEN_SCORE : estimated household net income
- KBA13_KRSSEG_KLEIN : share of small cars (referred to the county average) - PLZ8
- KBA13_SEG_KOMPAKTKLASSE : share of lowe midclass cars (Ford Focus etc.) in the PLZ8
- KBA05_ANHANG : share of trailers in the microcell
- KBA05_FRAU : share of female car owners
- D19_KONSUMTYP : consumption type
- KBA13_VORB_1_2 : share of cars with 1 or 2 preowner - PLZ8
- WOHNDAUER_2008 : length of residence
- KBA05_SEG1 : share of very small cars (Ford Ka etc.) in the microcell
- REGIOTYP : neighbourhood
- KBA13_SEG_MINIWAGEN : share of minicars within the PLZ8
- PLZ8_BAUMAX : most common building-type within the PLZ8
- RETOURTYP_BK_S : return type
- KBA13_FORD : share of FORD within the PLZ8
- KBA13_HALTER_55 : share of car owners between 51 and 55 within the PLZ8
- KBA13_HERST_AUDI_VW : share of Volkswagen & Audi within the PLZ8
- KBA13_KMH_110 : share of cars with max speed 110 km/h within the PLZ8
- PLZ8_ANTG2 : number of 3-5 family houses in the PLZ8
- KBA05_ALTER2 : share of car owners inbetween 31 and 45 years of age
- D19_TELKO_ANZ_24 : transaction activity TELCO in the last 24 months
- KBA13_KMH_211 : share of cars with a greater max speed than 210 km/h within the PLZ8
- KBA13_KRSAQUOT : share of cars per household (referred to the county average) - PLZ8
- LP_STATUS_FEIN : social status fine
- KBA13_BJ_2000 : share of cars built between 2000 and 2003 within the PLZ8
- SEMIO_LUST : affinity indicating in what way the person is sensual minded
- KBA13_SEG_WOHNMOBILE : share of roadmobiles within the PLZ8
- SEMIO_FAM : affinity indicating in what way the person is familiar minded
- PLZ8_HHZ : number of households within the PLZ8
- KBA13_KRSSEG_OBER : share of upper class cars (referred to the county average) - PLZ8
- KBA13_HERST_SONST : share of other cars within the PLZ8
- GEBURTSJAHR : year of birth
- SEMIO_SOZ : affinity indicating in what way the person is social minded
- CJT_TYP_3: not described
- VHN: not described
- D19_GARTEN: not described
- D19_TECHNIK: not described
- CJT_TYP_5: not described
- D19_VERSICHERUNGEN: not described
- D19_BEKLEIDUNG_REST: not described
- MOBI_RASTER: not described
- D19_GARTEN_RZ: not described
- D19_KINDERARTIKEL: not described
- D19_REISEN_RZ: not described
- D19_BANKEN_LOKAL: not described
- UMFELD_JUNG: not described
- D19_BUCH_CD: not described
- KONSUMZELLE: not described
- D19_SAMMELARTIKEL_RZ: not described
- D19_RATGEBER: not described
- KBA13_ANTG3: not described
- VK_ZG11: not described
- KBA13_BAUMAX: not described
- D19_HANDWERK: not described
- VK_DHT4A: not described
- CUSTOMER_GROUP: not described
- D19_KK_KUNDENTYP: not described
- AKT_DAT_KL: not described
- BIP_FLAG: not described
- ANZ_KINDER: not described
- CJT_TYP_1: not described
- SOHO_FLAG: not described
- RT_SCHNAEPPCHEN: not described
- D19_TECHNIK_RZ: not described
- KBA13_ANTG4: not described
- D19_SCHUHE_RZ: not described
- KBA13_CCM_1400_2500: not described
- D19_BILDUNG: not described
- D19_REISEN: not described
- D19_BIO_OEKO: not described
- HH_DELTA_FLAG: not described
- CJT_KATALOGNUTZER: not described
- ALTER_KIND2: not described
- D19_RATGEBER_RZ: not described
- D19_LETZTER_KAUF_BRANCHE: not described
- D19_KONSUMTYP_MAX: not described
- D19_FREIZEIT: not described
- CAMEO_DEUINTL_2015: not described
- KOMBIALTER: not described
- D19_TELKO_REST: not described
- D19_WEIN_FEINKOST: not described
- D19_SOZIALES: not described
- D19_BANKEN_REST: not described
- VK_DISTANZ: not described
- D19_KINDERARTIKEL_RZ: not described
- D19_VOLLSORTIMENT_RZ: not described
- D19_TIERARTIKEL: not described
- CJT_TYP_2: not described
- D19_KOSMETIK_RZ: not described
- D19_FREIZEIT_RZ: not described
- CAMEO_INTL_2015: not described
- DSL_FLAG: not described
- D19_LEBENSMITTEL_RZ: not described
- D19_BANKEN_DIREKT: not described
- D19_ENERGIE_RZ: not described
- ALTER_KIND4: not described
- KBA13_GBZ: not described
- UNGLEICHENN_FLAG: not described
- CJT_TYP_6: not described
- D19_BANKEN_GROSS_RZ: not described
- D19_VERSAND_REST: not described
- D19_HAUS_DEKO_RZ: not described
- D19_VERSI_OFFLINE_DATUM: not described
- D19_KOSMETIK: not described
- D19_LEBENSMITTEL: not described
- D19_VERSI_ONLINE_QUOTE_12: not described
- KBA13_KMH_210: not described
- SOHO_KZ: not described
- D19_TELKO_REST_RZ: not described
- GEMEINDETYP: not described
- D19_DIGIT_SERV: not described
- D19_TELKO_ONLINE_QUOTE_12: not described
- D19_LOTTO: not described
- RT_UEBERGROESSE: not described
- D19_VERSICHERUNGEN_RZ: not described
- D19_HANDWERK_RZ: not described
- D19_BANKEN_REST_RZ: not described
- EXTSEL992: not described
- D19_BEKLEIDUNG_GEH_RZ: not described
- RT_KEIN_ANREIZ: not described
- VHA: not described
- KBA13_CCM_1401_2500: not described
- KK_KUNDENTYP: not described
- KBA13_ANTG2: not described
- D19_BILDUNG_RZ: not described
- D19_BEKLEIDUNG_GEH: not described
- D19_SCHUHE: not described
- D19_BUCH_RZ: not described
- STRUKTURTYP: not described
- ALTER_KIND1: not described
- D19_VOLLSORTIMENT: not described
- ALTER_KIND3: not described
- UMFELD_ALT: not described
- D19_LOTTO_RZ: not described
- VERDICHTUNGSRAUM: not described
- WACHSTUMSGEBIET_NB: not described
- FIRMENDICHTE: not described
- KBA13_ANTG1: not described
- D19_NAHRUNGSERGAENZUNG: not described
- D19_HAUS_DEKO: not described
- HAUSHALTSSTRUKTUR: not described
- D19_NAHRUNGSERGAENZUNG_RZ: not described
- D19_VERSI_ONLINE_DATUM: not described
- EINGEZOGENAM_HH_JAHR: not described
- ONLINE_PURCHASE: not described
- PRODUCT_GROUP: not described
- ANZ_STATISTISCHE_HAUSHALTE: not described
- D19_TIERARTIKEL_RZ: not described
- EINGEFUEGT_AM: not described
- D19_BEKLEIDUNG_REST_RZ: not described
- D19_TELKO_MOBILE: not described
- D19_SONSTIGE: not described
- CJT_TYP_4: not described
- D19_DIGIT_SERV_RZ: not described
- D19_BANKEN_LOKAL_RZ: not described
- ALTERSKATEGORIE_FEIN: not described
- D19_DROGERIEARTIKEL: not described
- KBA13_HHZ: not described
- D19_WEIN_FEINKOST_RZ: not described
- GEOSCORE_KLS7: not described
- D19_BANKEN_DIREKT_RZ: not described
- D19_BANKEN_GROSS: not described
- D19_SONSTIGE_RZ: not described
- D19_TELKO_MOBILE_RZ: not described
- D19_SAMMELARTIKEL: not described
- ARBEIT: not described
- D19_ENERGIE: not described
- D19_VERSAND_REST_RZ: not described
- D19_DROGERIEARTIKEL_RZ: not described
- D19_VERSI_DATUM: not described
- D19_BIO_OEKO_RZ: not described
Target Variable
-
Recall
: The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples. -
Precision
: The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. -
F1
: The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is: F1 = 2 * (precision * recall) / (precision + recall) -
Accuracy
: Accuracy classification score.