Until 2015 hotel chain C operated 4 hotels, however, with the acquisition of new hotels, the hotel chain board decided to invest more in marketing. However, it was not until 2018 that the hotel chain created a marketing department and hired a new marketing manager, A. A realized that the current customer segmentation was not adequate, as it only reflected one only customer characteristic, its sales origin. It did not reflect geographic characteristics, such as the country of origin, demographic characteristics, such as age, or behavioral characteristics, such as the number of stays. Without proper customer segmentation, it is difficult for A to define a strategy to reach new customers and to continue to captivate the current customers. Taking into consideration the multiple distribution channels that hotels operate nowadays (travel agencies, travel operators, online travel agencies – OTA, brand websites, meta searchers websites, among others). For example, corporate customers tend to make reservations very near the arrival date, book directly with the hotel, and be willing to pay more for a better-equipped room, while a customer on holiday tends to make reservations more distant from the arrival date, book with a travel operator or OTA, and to look for better price opportunities. Therefore, products “creation”, pricing definitions, and other marketing tasks, such as advertising, must take into consideration the targets of its efforts according to the different channels and groups of customers.
- Explore the data and identify the variables that should be used to segment customers
- Use K-Means clustering to identify customers segments
- Justify your selection of K (taking into consideration the business use)
- Use PCA to reduce dimensionality and speed-up model development
- Suggest business applications for the findings
The focus of this project is to understand current customer characteristics in terms of revenue brought to the company, geography, demography, psychography and consumer behavior. We the given data using the latest methods and technological tools. Our plan is to find patterns in the features of the clients, allowing us to assign each client to a different group. Customer segmentation identifies the discrete group of customers with a high level of accuracy based on current customer data. This solution will allow the business to enable its marketing department and product developers to improve their business strategy by addressing customers individually in a more effective manner. Following the creation of these clusters, we will provide some insights of our own regarding the results obtained, to help make the hotel make strategic decisions to retain and attract customers. This will include our recommendations for the deployment of our solution, as well as any monitoring and maintenance measures necessary.
- EXECUTIVE SUMMARY
- Introduction
- Customer segmentation with CRISP-DM methodology
- 3.1. Business understanding
- 3.1.1. Business Objectives
- 3.1.2. Business Success Criteria
- 3.1.3. Situation Assessment
- 3.1.4. Determine Data Mining goals
- 3.2. Data understanding
- 3.3. Data preparation
- 3.4. Modeling
- 3.5. Evaluation
- 3.6 DEPLOYMENT AND MAINTENANCE PLANS
- CONCLUSIONS
- 4.1. Considerations for model improvement
- 4.1.1. Collect clean data
- 4.1.2. Regular data cleaning
- 4.1.3. Standardize data formats
- 4.1.4. Collect additional data
- REFERENCES
- APPENDIX (List of figures)
- Cleaned data by removing duplicates and anomalies
- Performed PCA to reduce dimensions
- Clustered data into 7 segments using KMeans
- Analyzed segments by age, region, channel, and spending
- Developed targeted marketing strategies for each segment
The data was cleaned and then segmented into 7 clusters using PCA and KMeans. The segments were analyzed by demographics and spending habits. Targeted marketing strategies were developed for each segment to increase revenue through customized campaigns. The data-driven approach will improve customer retention and acquisition. Here is a very short summary of the conclusions for each cluster:
- VHV: High value older European customers booking through travel agents.
- HV: High value middle-aged European customers booking through travel agents.
- MV: Moderate value older European customers booking through travel agents.
- LV: Low value middle-aged European customers booking through travel agents.
- VLV: Very low value middle-aged European customers booking through travel agents.
- NV(A): No value very young and very old European customers booking through travel agents.
- NV(B): No value middle-aged European customers booking through travel agents.