
Winery Customer Segmentation
Transformed raw data to operational data through handling missing values and mapping data. Conducted logistic regression, decision tree, random forest models to evaluate current customer segmentations. Built a new segmentation model baed on RFM (Recency, Frequency and Monetary) to generate targeted marketing strategies.
Customer Analytics Project
Business Objective
Dataset
Actions
Result
To understand and evaluate current customer segmentation and generate a new segmentation model for stakeholders
Data on 65,000+ customer-level winery purchases with 22 columns including customer ID, order ID, customer segment, sales amount, and so on. The dataset already established 4 customer segments: Casual Visitor, Luxury Estate, High Roller and Wine Enthusiast.
1. Extract, Transform, Load (ETL)
We used Excel and R to clean the dataset. First, we identified and handled the missing values by using “Date” column to fill up the missing data. We transformed the dataset by calculating total sales and orders for each customer and aggregating sales in different channels for each customer. We also converted categorical variables into binary variables to run our models.
2. Evaluate the Current Customer Segmentation
The company used the segmentation model to segment their customers into four groups: Casual Visitor, Luxury Estate, High Roller, and Wine Enthusiast. Before proposing a new segmentation model, we decided to evaluate their original model by conducting logistic regression models, Classification and Regression Trees (CART) models and random forest models.
​
3. Generate a New Segmentation Model
After analyzing the winery’s current customer segmentation, we decided to develop an RFM(Recency, Frequency,Monetary)-based segmentation of the customer data set as an alternative segmentation. Looking at individual RFM customer segments, we labelled and described unique segments that could benefit from specific targeted marketing strategies.
For example, High Load Customers have the highest recency, frequency and monetary of their winery purchases. These customers have significantly higher order quantities than other segments. Focus on size of orders by recommending more expensive wines or wine bundles.
As compared with the winery’s previous segmentation model, the new RFM segmentation model offers clear strategies for different customer groups. These strategies are likely to be more successful than a mass marketing strategy because they are catered towards the unique preferences of the segment. The winery can therefore save money while still increasing its marketing impact.

Heading 3
Significant Variables of
Logistic Regression Model
-
We used the logistic regression model to evaluate the coefficients and the significance of these variables, such as total sales, total orders, email subscription, and so on, to each customer segment.
-
According to these logistic regression models, we could figure out the differences between each customer segment. For example, the total sales variable is significant to High Roller and Causal Visitor while the variable is not significant to Wine Enthusiast and Luxury Estate.
-
In addition, it is interesting to discover that total email sales variable is not significant to High Roller.


Heading 3
Accuracy of
Logistic Regression Model
-
Next, we used the confusion matrix to evaluate the accuracy of these results. We separated the dataset into training and testing data by a ratio of 70% and 30% respectively, and tested the model on test set.
-
The accuracy of these logistic regression models on test set are between 75% to 90%, and the accuracy on High Roller segment even reached almost 90%.
-
Additionally, the accuracy of these models are all above the accuracy of their baseline models.
