This the one application where sometimes I see confusion even amongst seasoned practitioners of analytics. There are a lot of definitions, interpretations and approaches to segmentation and none of them are necessarily right or wrong. They all reflect personal views based on personal experience and knowledge in this area. For some, segmentation is another name for clustering which is an analytical method. For others segmentation is more than just analytical technique – it is an analytical application. At his point, it is important to stress that not all segmentation efforts are analytically-based. Some segments are business-derived and often based on customer spent measures - “super-shopper”, “taster”, “early-birds”, “late comers”, “bargain hunters”, etc). Sometime, business segments have the simple name of the loyalty card in their possession – “silver”, “gold” or “diamond”. There is great value in business-driven segmentation, since they usually represent some very actionable business logic, however - those type of segments are too well defined to be useful, or actionable. It is advisable to use these business segments as the starting point before going deeper in them with more granular, data driven analytically-driven segmentation. This form of segmentation is quick, unbiased and objective method of identifying natural clusters in data. Problem with this approach is that there is no guarantee that these “natural” clusters will be of any business use. So, natural segmentation often involves “trial and error” algorithmic search until at least some of the clusters identified show some practical value to the business.
Marketing and Behavioral Segmentation
Analytical technique most often used to do “natural” segmentation is clustering. This method main limitation is that doesn’t deal well with discrete and categorical variables, and only way it can handle categorical fields is by transforming them into many new variables called “dummy variable” which has binary “1”, or “0” vales, where “1” indicates presence of specific category and “0” presence of any of all the other categories. Creation of new binary variable for every category of every categorical variables massively increases dimensionality of data and with many variables it is hard to have any control in terms of how end-segments, will look like. Therefore, it is advisable to do separate segmentation of data that has similar structure, such as gender, age, marital status, and geography. Such segmentation is called marketing segmentation. Those variables are mostly discrete and categorical. Other commonly used segmentation type is behavioral segmentation which is based on product affinities and purchasing behavior.
While behavioral segments offer different marketing opportunities in respect to better product marketing, cross-selling, more effective loyalty and retention campaigns – marketing segmentation helps the company locate specific target audience in respect to where they live, how old they are, whether they are single or married, etc.
Therefore, in order to really get value out of both approaches one needs to overlay behavioral segments with marketing segments. But there are other segmentation drivers such as value, risk, loyalty. Once we have a clear segmentation driver this becomes more “supervised” segmentation, which is identical with the classification task. In practice we have two types of analytical methods – unsupervised and supervised. The difference between them is that unsupervised methods don’t require the presence of target variable and clustering is such method, however if we want to segment customer base on some predefined classes then we need to have target variable which would then guide segmentation process which then becomes typical classification task.
RFM method segment the customer base based on recency of purchase (R), frequency of purchase (F) and monetary value (M). This method is commonly used by direct marketers and retailers, however I have used it in other data mining applications where it was not about customers at all – such as in predicting the likelihood of unplanned loss of production in the energy sector. It is important to stress that any phenomena can be segmented based on these very powerful factors of how recently something happens, how often and to which size, scale or level.
Recency parameter is the most powerful of the 3. In forecasting models latest time series often has the highest weighting and is the most predictive of the next forecasting value. Second most powerful is the frequency as long as the definition of the frequency is limited to last month or quarter and not over entire life-span of customer relationship. Least powerful is the monetary value. Since the total value in the period of time is directly correlated with frequency it is advisable to use an average value. There are several different ways to calculate RFM groups and scores and below is the classic approach:
First create 5 segments based on the recency, dividing the data file into 5 exact quintiles, where the contacts with the most recent Transactions (i.e. in the top 20% of the file) are given a recency value of 5, then the next 20% are given a recency value of 4 and so on. Then, each of those quintiles, segmented into 5 further quintiles based on the frequency value for each contact where the contacts with the highest transaction frequency value are of 5, then the next 20% is given a frequency value of 4 and so on. Finally, each of these segments is then segmented into 5 further quintiles, based on the monetary value of each contact; i.e. the total amount which all that contact’s transactions add up to. Those contacts with the highest monetary values (i.e. in the top 20%), are given a monetary value of 5, then the next 20% are given a monetary value of 4 and so on.) At the end of this process, you will have 125 segments with a RFM group between 111 and 555 with the same number of contacts within each segment; and each contact will have a RFM score of between 3 and 15.
An alternative approach is to still calculate RFM Groups/Scores using quintiles, but by using the Independent RFM Quintile approach, not just the recency but also the frequency and monetary values for each contact are calculated across the whole data file and are not dependent on any of the other values/RFM factors or any other quintile. Another approach is to use user-definable bands for each criterion (i.e. each RFM factor) in order to determine what recency, frequency and monetary value that should be given to each contact. Even-though RFM segmentation can be used on “stand-alone” basis, I always tend to incorporate it with other demographic and affinity variables in order to have more holistic view of the segment's make-up.
Often customer segments are constructed on “value” or “spend” metrics as in “RFM” approach. Such segments are hierarchical in terms of meaning to the company. Therefore, customers who are in “high” spend segment are naturally more valuable to an organization than customers in “medium” or “low” spend segments. Such hierarchical segmentation structure is usually very dynamic which means that customer can move from one segment to another depending on their spending behavior. This also means that companies can develop migration strategies to entice customer moving in high value/spend segment, or even to prevent him of dropping to a lower value segment. Within each hierarchical segment, there are members who typically segment members sitting more in segment core. Members of the upper segment layer are likely to migrate to a segment above with the right marketing stimuli and members below segment core can drop off unless they are enticed to stay.
Segmentation Best Practices
As for optimal number of segments, usually it is between 5 and 12. Too many segments are difficult to interpret and implement and too few are too general for practical use. A good standard practice in deciding the number of segments is to divide the analysis dataset into two, that is, training and validation data sets, and perform the analysis on both the datasets separately. If the segment structure and the segment description of both the datasets are similar, the data mining analyst can conclude that the number of segments is correct and the segment structure is stable. After segmentation is completed one needs to describe segments and typically the interpretation is based on comparison of the mean values of each segmentation variable. The interpretation is started by comparing each segment separately with the rest of the clients. Visualization methods are another powerful way of segment description. Sometimes classification/decision trees are used for the segment description. It is important to know that not all segments will be equally valuable or actionable so that right expectancy is set. Otherwise segmentation can become game of chasing shadows. One annoying feature of segmentation analysis is that the consumers who are close to the edge or border of two segments may change the segment every time a scoring is done. In real life the differences between any two segments are not that clear and there are always a relatively big number of customers who fall in between two segments and these customers are perfect candidates for segment migration.
After specific segments are implemented, it is vital that there are reports in place that measure segments sizes, segment movements and trend and that often depends on choice of segment time-frame. The time frame in customer segmentation analysis is the number of time points (typically months) used to summarize the client behavior in a customer level in order to run the segmentation analysis. The length of the time frame for consumer behavior used in the segmentation analysis affects the stability of the segments. Too short time frame results in unstable customer segments and too long a time frame results in overly static customer segments. For example, if a bank decides to use weekly segments, most of the customers would be changing segments from week to week due to pure random fluctuation in their typical behavior.
However, if segmentation were done on a yearly basis, it would take several months for the effect of the latest change in customer behavior to result in a change of a customer from one segment to another. This is because the past behavior would outweigh the new behavior in the analysis. The most typical selections for the length of the time frame in the customer behavior segmentation are monthly and quarterly segments. Monthly segments have sometimes been found to be too unstable while, quarterly segments react to recent changes in customer behavior and provide decent stability. A more practical time frame would be a compromise of quarterly segments with monthly scoring. Segmentation use can be for different purposes. Micro-segmentation is tactical, targeted, action-oriented tool for immediate targeting such as in day-to-day direct campaigns (cross- and up-sell), targeted churn prevention, acquisitions.
Every business has three key objectives for its customers: to get them as much as possible, to keep them for as long as possible and to grow them as much as possible. For each of three value segments, a business has a ‘portfolio’ of objectives that include all three: acquisition, retention and growth. They are not, however, present in equal measures. For the high-value segment retention is a top priority (lose a few customers here and you lose significant revenue). This justifies a corresponding investment in achieving loyalty: the choice of communication channels (live reps, even face-to-face meetings), through appropriate service levels (the higher – the more costly), to the rewards of loyalty. The medium-value segment has retention as a second priority. Here, company will spend more on acquisition, and yet sufficient resources will be allocated to protect outflow of revenue through churn. Service levels will still be excellent (just not exceptional), and the mix of communication channels may include less of the most interactive (and most expensive) ones like face-to-face and outbound reps. The low-value segment has a top priority to stop losing money from the least profitable customers: not by deliberately losing them, but migrating them to a higher segment by encouraging use is a high priority. Needless to say, retention (especially of loss-causing customers) is a low priority. There may be little or no dedicated spend within this segment, and just basic service. In fact self-service is encouraged as cost-effective.
For a segment to be useful, there are a variety of requirements that need to be met. Firstly, data availability. Data needs to be continuously available at some minimum level of quantity and quality. Segments are often determined and produced on top of some datasets, with expectancy that different data will arrive for segmentation to allocate it in existing segments. Secondly, access! Segmented clients have to be accessible so that they can be contacted directly or indirectly through various channels. Segments need to be differing - as heterogeneous as possible across different segments, and as homogeneous within a segment so that one marketing approach can be effective for multiple segment members. Segments also need to be substantial in terms of size. No company will set up a marketing campaign for segments that contain only few members. And lastly segments need to be measurable. Its size, growth potential, value, even some qualitative criteria need to be quantifiable otherwise it will be difficult to measure segment-derived value.