Mystery shopping not in pursuit of an overall customer experience objective may be interesting, it may be successful in motivating certain service behaviors, but ultimately will fail in maximizing return on investment.
Consider the following proposition:
“Every time a customer interacts with a brand, the customer learns something about the brand, and based on what they learn, adjust their behavior in either profitable or unprofitable ways.”
These behavioral adjustments could be profitable: positive word of mouth, complain less, less expensive channel use, increased wallet share, loyalty, or purchase intent, etc.. Or…these adjustments could be unprofitable: negative word of mouth, more complaints, decreased wallet share, purchase intent or loyalty, etc.
There is power in this proposition. Understanding it is the key to managing the customer experience in a profitable way. Unlocking this power gives managers a clear objective for the customer experience in terms of what you want the customer to learn from it and react to it. Ultimately, it becomes a guidepost for all aspects of customer experience management – including customer experience measurement.
In designing customer experience measurement tools, ask yourself:
- What is the overall objective of the customer experience?
- How do you want the customer to feel as a result of the experience?
- How do you want the customer to act as a result of the experience?
- Do you want the customer to have increased purchase intent?
- Do you want the customer to have increased return intent?
- Do you want the customer to have increased loyalty?
The answer to the above series of questions will become the guideposts for designing a customer experience which will achieve your objectives.
The answers to the above questions will serve as a basis for evaluating the customer experience against your objectives. In research terms, the answer to this question or questions will become the dependent variable(s) of your customer experience research – the variables influenced or dependent on the specific attributes of the customer experience.
For example, let’s assume your objective of the customer experience is increased return intent. As part of a mystery shopping program, ask a question designed to capture return intent – a question like, “Had this been an actual visit, how did the experience during this shop influence your intent to return for another transaction?” This is the dependent variable.
The next step is to determine the relationship between every service behavior or attribute and the dependent variable (return intent). The strength of this relationship is a measure of the importance of each behavior or attribute in terms of driving return intent. It provides a basis from which to make informed decisions as to which behaviors or attributes deserve more investment in terms of training, incentives, and rewards.
This is what Kinesis calls Key Driver Analysis, an analysis technique designed to identify service behaviors and attributes which are key drivers of your key objectives of the customer experience. In the end, providing an informed basis for which to make decisions about investments in the customer experience.
Net Promoter Score (NPS) burst on the customer experience scene 15 years ago in a Harvard Business Review article with the confident (some might say over confident) title “The One Number You Need to Grow.” NPS was introduced as the one survey question you need to ask in a customer survey.
Unfortunately, I’ve seen many customer experience managers include NPS in their mystery shopping programs, which is frankly a poor research practice.
The NPS methodology is relatively simple. Ask customers a “would recommend” question, “How likely are you to recommend us to a friend, relative or colleague?” on an 11-point scale from 0-10.
Next, segment respondents according to their responses to this would recommend question. Respondents who answered “9” or “10” are labeled “promoters”, those who answered “7” or “8” are identified as “passive referrers”, and finally, those who answered 0-6 are labeled “detractors”. Once this segmentation is complete, the Net Promoter Score (NPS) is calculated by subtracting the proportion of “detractors” from the proportion of “promoters.” This yields the net promoters, the proportion of promoters after the detractors have been subtracted out.
The theory behind NPS is simple. It is used as a proxy for customer loyalty. Loyalty is a behavior, surveys best measure attitudes, not behaviors. Therefore customer experience researchers need a proxy measurement for loyalty. NPS is considered an excellent proxy for loyalty under the theory that if one is likely to put their reputation at risk by referring a brand to others, they are more likely to be loyal to the brand. In contrast, to those who are not willing to put their reputation at risk are less likely to be loyal.
Fads in customer experience measurement come and go. The NPS fad has been particularly stubborn. Mostly because the theory behind it is intuitive, it is a solution to the problem of measuring loyalty within a survey, and it is simple. I personally think it was oversold as the “one number you need to grow.” Overselling it as the one number you need to grow doesn’t do justice to the complexities of managing the customer experience, nor does one NPS number give any direction in terms of how to improve your NPS score. An NPS score alone is just not very actionable.
While NPS is an excellent loyalty proxy and has a lot of utility is a customer experience survey, it is not an appropriate tool to use in a mystery shopping context. Mystery shopping is a snapshot of one experience in time, where a mystery shopper interacts with the representative of the brand. NPS is a measure of one’s likelihood to refer the brand to others. The problem is the likelihood to refer the brand to others is almost never the result of a snapshot in time. Rather, it is a holistic measure of the health of the entire relationship with the brand, and as such does not work well in a mystery shop context where the measurement is of a single interaction. As such, NPS is a measure of things unrelated to the specific experience measured in the mystery shop; things like: past-experiences, overall branding, alignment of the brand to customer expectations, etc.
Now, I understand the intent of inserting NPS in the mystery shop. It is to identify a dependent variable from which to evaluate the efficacy of the experience. NPS is just the wrong solution for this objective.
There is a better way.
Instead of blindly using NPS in the wrong research context, focus on your business objectives. Ask yourself:
- What are our business objectives with respect to the experience mystery shopped?
- What do we want to accomplish?
- How do we want the customer to feel as a result of the experience?
- What do we want the customer to do as a result of the experience shopped?
Once you have determined what business objectives you want to achieve as a result of the customer experience, design a specific question to measure the influence of the customer experience on this business objective.
For example, assume your objective of the customer experience is purchase intent. You want the customer to be more motivated to purchase after the experience than before. Ask a purchase intent question, designed to capture the shopper’s change in purchase intent as a result of the shop.
Now, you have a true dependent variable from which to evaluate the behaviors measured in the mystery shop. This is what we call Key Driver Analysis – identifying the behaviors which are key drivers of the desired business objective. In the example above we want to identify key drivers of purchase intent.
I like to think of different question types and analytical techniques as tools in a tool box. Each is important for its specific purpose, but few are universal tools which work in every context. NPS may be a useful tool for customer experience surveys. It is not, however, an appropriate tool for mystery shopping.
Best in class mystery shop programs clearly communicate behavioral expectations to frontline employees. There should be no surprises in mystery shopping.
Brands have personality. Brand personality is a set of characteristics associated with the positioning, products, price and service mix offered by a company. Launch the program by communicating your desired brand personality. While branding is a complicated mix of product, price, positioning and place, it often falls on the frontline employees to make the brand real in the perception of the customers – to animate the brand. It is, therefore, critical that employees’ service behaviors be aligned with the brand personality. Start the mystery shop program launch with a clear description of your desired brand personality.
After communication of the brand personality, the next step is to define what specific sales and service behaviors you expect from employees as ambassadors of the brand. Create a list of behavioral expectations by asking yourself the following questions:
- What specific service behaviors do we expect?
- When greeting a customer, what specific behaviors do we expect from staff?
- When meeting with customers after the greeting, what specific behaviors do we expect?
- If a phone interaction, what specific hold/transfer procedures do we expect (for example asking to be placed on hold, informing customer of the destination of the transfer)?
- Are there specific profiling questions we expect to be asked? – If so, what are they?
- What closing behaviors do we expect? How do we want employees to ask for the business?
- At the conclusion of the interaction, how do we want the employee to conclude the conversation or say goodbye?
- Are there specific follow-up behaviors that we expect, such as getting contact information, suggesting another appointment, or offering to call the customer?
- What other specific behaviors do we expect?
Remember the goal is to ensure employees animate the brand. Each behavior expected should support this end.
Ultimately it is a best practice to give employees a copy of the actual questionnaire and shopper guidelines. Best in class mystery shop questionnaires are composed of a mixture of objective behavioral observations and subjective impressions and comments.
The objective observations of behaviors form the backbone of the program. They measure and motivate the specific sales and service behaviors expected from employees. These observations must be both objective and empirical, answering the question, was a specific behavior observed or not?
Rating scales are the most common means of collecting subjective impressions. Measures of how the shopper felt about the experience. They add both a qualitative and quantitative perspective to the objective behaviors, as well as provide a basis for interpreting their importance.
While empirical behaviors are the backbone of the shop, many of Kinēsis’ clients consider open-ended comments the heart of the shop. Subjective open-ended questions should reveal valuable insight into understanding exactly how the shopper felt about the experience.
There should be no surprises in mystery shopping. Customer-facing employees should understand exactly what behaviors are being measured, and how shoppers are to interpret these behaviors in terms of completing the questionnaire.
There should be no surprises in mystery shopping. When investments in mystery shopping fail to achieve their potential, it is often because those who are accountable for the results, the front-line employees and their direct managers, were not properly introduced to the program.
Improper positioning and introduction of the program risks creating internal resistance. Front-line personnel may interpret mystery shopping as something akin to Orwell’s Big Brother – interpreting it as a distrustful management checking up on their employees. They may see the mystery shop program solely as a means of realizing financial rewards, rather than more intrinsic rewards such as being better at their profession, and as a result game the system by frivolously disputing shops. This internal resistance often manifests itself in the form of excessive disputes, questioning everything, wasting hours of time reviewing security films, and playing a game of indentifying the shopper – almost always phantom shoppers (actual customers who are not mystery shopping them). All this internal resistance creates an unnecessary distraction from realizing the brand’s customer experience goals.
Key to launching a successful mystery shopping program is communication, positive communication of: behavioral expectations of employees, guidance regarding internal program administration, and instruction on how to use the results to improve performance. There should be no surprises in mystery shopping, surprises create resistance and kills buy-in.
Position mystery shopping as a win-win. Position it that mystery shopping is designed to help the employee by making them better at their jobs. Employees want to succeed. They want to be good at their jobs. Leverage this desire to succeed in obtaining buy-in from the frontline.
It is, therefore, critical to ensure employees throughout the organization are fully informed and have bought into the program before it is launched. Pre-launch communication should include:
- definition of the brand
- description of the employees’ role as ambassadors of the brand
- list specific behaviors expected of employees (including a copy of the mystery shop questionnaire)
- answering procedural questions of how to communicate program related issues
- training employees how to read mystery shopping reports
- Finally, how to use the information effectively, including and how to set goals for improvement.
Proper launching of a mystery shop program is critical to its success. Starting on the right foot positions mystery shopping in the minds of customer-facing personnel as a positive tool to help them become better at their jobs – and offers real benefits to them both in terms of rewards as a result of the shop, but also intrinsically as it reinforces sales and service behaviors that will benefit them throughout their career.
Communication is key – again, there should be no surprises in a mystery shop program.
This is perhaps the most common question I’m asked by clients old and new alike. There seems to be a common misconception among both clients and providers, that any one number, say 90% is a “good” mystery shop score. Beware of anyone who glibly throws out a specific number without any consideration of the context. They are either ignorant, glib or both. Like most things in life, the answer to this question is much more complex.
Most mystery shopping programs score shops according to some scoring methodology to distill the mystery shop results down into a single number. Scoring methodologies vary, but the most common methodology is to assign points earned for each behavior measured and divide the total points earned by the total points possible, yielding a percent of points earned relative to points possible.
It amazes me how many mystery shop providers I’ve heard pull a number out of the air, again say 90%, and quote that as the benchmark with no thought given to the context of the question. The fact of the matter is much more complex. Context is key. What constitutes a good score varies dramatically from client-to-client, program-to-program and is based on the specifics of the evaluation. One program may be an easy evaluation, measuring easy behaviors, where a score must be near perfect to be considered “good” – others may be difficult evaluations measuring more difficult behaviors, in this case a good score will be well below perfect. The best practice in determining what constitutes a good mystery shop score is to consider the distribution of your shop scores as a whole, determine the percentile rank of each shop (the proportion of shops that fall below a given score), and set an appropriate cut off point. For example, if management decides the 60th percentile is an appropriate standard (6 out of 10 shops are below it), and a shop score of 86% is in the 60th percentile, then a shop score of 86% is a “good” shop score.
Again, context is key. What constitutes a good score varies dramatically from client-to-client, program-to-program and is based on the specifics of the evaluation. Discount the advice of anyone in the industry who glibly throws out a number stating it’s a good score, without considering the context.
Most mystery shopping programs score shops according to some scoring methodology to distill the mystery shop results down into a single number. Scoring methodologies vary, but the most common methodology is to assign points earned for each behavior measured and divide the total points earned by the total points possible, yielding a percentage of points earned relative to points possible.
Drive Desired Behaviors
Some behaviors are more important than others. As a result, best in class mystery shop programs weight behaviors by assigning more points possible to those deemed more important. Best practices in mystery shop weighting begin by assigning weights according to management standards (behaviors deemed more important, such as certain sales or customer education behaviors), or according to their importance to their relationship to a desired outcome such as purchase intent or loyalty. Service behaviors with stronger relationships to the desired outcome receive stronger weight.
One tool to identify behavioral relationships to desired outcomes is Key Driver Analysis. See the attached post for a discussion of Key Driver Analysis.
Don’t Average Averages
It is a best practice in mystery shopping to calculate the score for each business unit independently (employee, store, region, division, corporate), rather than averaging business unit scores together (such as calculating a region’s score by averaging the individual stores or even shop scores for the region). Averaging averages will only yield a mathematically correct score if all shops have exactly the same points possible, and if all business units have exactly the same number of shops. However, if the shop has any skip logic, where some questions are only answered if specific conditions exist, different shops will have different points possible, and it is a mistake to average them together. Averaging them together gives shops with skipped questions disproportionate weight. Rather, points earned should be divided by points possible for each business unit independently. Just remember – don’t average averages!
Work Toward a Distribution of Shops
When all is said and done, the product of a best in class mystery shop scoring methodology will produce a distribution of shop scores, particularly on the low end of the distribution.
Mystery shop programs with tight distributions around the average shop score offer little opportunity to identify areas for improvement. All the shops end up being very similar to each other, making it difficult to identify problem areas and improve employee behaviors. Distributions with scores skewed to the low end, make it much easier to identify poor shops and offer opportunities for improvement via employee coaching. If questionnaire design and scoring create scores with tight distributions, consider a redesign.
Most mystery shopping programs score shops according to some scoring methodology. In designing a mystery shop score methodology best in class programs focus on driving desired behaviors, do not average averages and work toward a distribution of shops.
Best in class mystery shop programs provide managers a means of applying coaching, training, incentives, and other motivational tools directly on the sales and service behaviors that matter most in terms of driving the desired customer experience outcome. One tool to identify which sales and service behaviors are most important is Key Driver Analysis.
Key Driver Analysis determines the relationship between specific behaviors and a desired outcome. For most brands and industries, the desired outcomes are purchase intent or return intent (customer loyalty). This analytical tool helps mangers identify and reinforce sales and service behaviors which drive sales or loyalty – behaviors that matter.
As with all research, it is a best practice to anticipate the analysis when designing a mystery shop program. In anticipating the analytical needs of Key Driver Analysis identify what specific desired outcome you want from the customer as a result of the experience.
- Do you want the customer to purchase something?
- Do you want them return for another purchase?
The answer to these questions will anticipate the analysis and build in mechanisms for Key Driver Analysis to identify which behaviors are more important in driving this desired outcome – which behaviors matter most.
Next, ask shoppers if they had been an actual customer, how the experience influenced their return intent. Group shops by positive and negative return intent to identify how mystery shops with positive return intent differ from those with negative. This yields a ranking of the importance of each behavior by the strength of its relationship to return intent.
Additionally, pair the return intent rating with a follow-up question asking, why the shopper rated their return intent as they did. The responses to this question should be grouped and classified into similar themes, and grouped by the return intent rating described above. The result of this analysis produces a qualitative determination of what sales and service practices drive return intent.
Finally, Key Driver Analysis produces a means to identify which behaviors have the highest potential for return on investment in terms of driving return intent. This is achieved by comparing the importance of each behavior (as defined above) and its performance (the frequency in which it is observed). Mapping this comparison in a quadrant chart, provides a means for identifying behaviors with relatively high importance and low performance – behaviors which will yield the highest potential for return on investment in terms of driving return intent.
Behaviors with the highest potential for return on investment can then be inserted into a feedback loop into the mystery shop scoring methodology by informing decisions with respect to weighting specific mystery shop questions, assigning more weight to behaviors with the highest potential for return on investment.
Employing Key Driver Analysis gives managers a means of focusing training, coaching, incentives, and other motivational tools directly on the sales and service behaviors that will produce the largest return on investment. See the attached post for further discussion of mystery shop scoring.
Plan for Change
Finally, given mystery shopping measures employee behaviors against service standards, it is a best practice in mystery shopping to calibrate and align service standards with customer expectations. This is achieved by maintaining a feedback loop from customer expectations uncovered with surveys of customers back into updating both service standards based on these customer expectations and mystery shopping to measure and reinforce those standards. Such an informed feedback loop between customer surveys and mystery shopping will ensure the behaviors measured are aligned with customer expectations.
Even well-designed and administered best practices in mystery shopping research requires periodic adjustment. Performance scores eventually flatten out or cluster together, diminishing the value of the program as a tool for rewarding top performers and continuously improving quality. Periodic reviews should be worked into the program design so it can be kept relevant and useful, and so the bar can be repeatedly raised on service quality and employee performance.
Truth be told…mystery shop data collection is largely a commodity, all mystery shop providers have access to the same pool of shoppers, and use similar technology to collect shop data. The source of differentiation is the extent to which a provider can help take meaningful action on the results.
Hire a provider that can be a partner. Large companies often employ an excruciating bidding process that rarely identifies the best vendor for their needs. They issue lengthy RFPs for mystery shopping that are meant to weed out the weakest contenders, but by asking bidders to commit to overly detailed and inappropriate specifications, they effectively eliminate more sophisticated companies at the same time. The typical RFP process creates an environment in which mystery shopping vendors over-promise in order to make the first cut, thus setting themselves up for failure if they win the account. In addition, it treats mystery shopping research as a commodity, regarding it as a bulk purchase of data rather than a high-value quality improvement tool. Companies have more success when they research the market carefully and identify the providers that have the knowledge and commitment to help them build a truly valuable program.
It is the employees who animate the brand, and it is imperative that employee sales and service behaviors be aligned with the brand promise. Actions speak louder than words. Brands spend millions of dollars on external messaging to define an emotional connection with the customer. However, when a customer perceives a disconnect between an employee representing the brand and external messaging, they almost certainly will experience brand ambiguity. The result severely undermines these investments, not only for the customer in question, but their entire social network. In today’s increasingly connected world, one bad experience could be shared hundreds if not thousands of times over. Mystery shopping is an excellent tool to align sales and service behaviors to the brand.
Mystery shopping programs, when administered in accordance with certain mystery shopping best practices, identify the sales and service behaviors that matter most – those which drive purchase intent and customer loyalty.
Call to Action Analysis
A best practice in mystery shop design is to build in call to action elements designed to identify key sales and service behaviors which correlate to a desired customer experience outcome. This Key Driver Analysis determines the relationship between specific behaviors and a desired outcome. For most brands and industries, the desired outcomes are purchase intent or return intent (customer loyalty). This approach helps brands identify and reinforce sales and service behaviors which drive sales or loyalty – behaviors that matter.
Earlier we suggested anticipating the analysis in questionnaire design in a mystery shop best practice. Here is how the three main design elements discussed provide input into call to action analysis.
Shoppers are asked if they had been an actual customer, how the experience influenced their return intent. Cross-tabulating positive and negative return intent will identify how the responses of mystery shoppers who reported a positive influence on return intent vary from those who reported a negative influence. This yields a ranking of the importance of each behavior by the strength of its relationship to return intent.
In addition, paired with this rating is a follow-up question asking, why the shopper rated their return intent as they did. The responses to this question are grouped and classified into similar themes, and cross-tabulated by the return intent rating described above. The result of this analysis produces a qualitative determination of what sales and service practices drive return intent.
The final step in the analysis is identifying which behaviors have the highest potential for ROI in terms of driving return intent. This is achieved by comparing the importance of each behavior (as defined above) and its performance (the frequency in which it is observed). Mapping this comparison in a quadrant chart, like the one to the below, provides a means for identifying behaviors with relatively high importance and low performance, which will yield the highest potential for ROI in terms of driving return intent.
This analysis helps brands focus training, coaching, incentives, and other motivational tools directly on the sales and service behaviors that will produce the largest return on investment – behaviors that matter.
Part of Balanced Scorecard
A best practice in mystery shopping is to integrate customer experience metrics from both sides of the brand-customer interface as part of an incentive plan. The exact nature of the compensation plan should depend on broader company culture and objectives. In our experience, a best practice is a balanced score card approach which incorporates customer experience metrics along with financial, internal business processes (cycle time, productivity, employee satisfaction, etc.), as well as innovation and learning metrics.
Within these four broad categories of measurement, Kinēsis recommends managers select the specific metrics (such as ROI, mystery shop scores, customer satisfaction, and cycle time), which will best measure performance relative to company goals. Discipline should be used, however. Too many can be difficult to absorb. Rather, a few metrics of key significance to the organization should be collected and tracked in a balanced score card.
Best in class mystery shop programs identify employees in need of coaching. Event-triggered reports should identify employees who failed to perform targeted behaviors. For example, if it is important for a brand to track cross- and up-selling attempts in a mystery shop, a Coaching Report should be designed to flag any employees who failed to cross- or up-sell. Managers simply consult this report to identify which employees are in need of coaching with respect to these key behaviors – behaviors that matter.
Decisions regarding the number of shops are primarily driven by budgetary resources available and the level of statistical reliability required.
Reliability at Individual or Store Level
The most appropriate measure of reliability at the individual or store level is maximum possible shop distortion (MPSD). Given that shops are snapshots of specific moments in time, it is possible for unique events to influence the outcome of any one shop. It is possible, therefore, that the experience observed by the mystery shopper is not representative of what normally happens. Consider the following examples: a retail location is shopped hours after it was held up, or a bank teller is shopped on the day after her child was up sick all night, or a server at a restaurant just had an extremely bad day. In each of these cases, it is possible these external events impacted employee performance and the customer experience.
How do we know if the experience is typical or not?
Maximum possible shop distortion is the maximum influence any unique event can have on a set of shops to an individual or location.
With one shop to a given location, we do not know if it is typical or not; we only have one data point, so the MPSD is 100%. It is possible the experience is not representative of what is typical. With two shops, the MPSD is 50%. If there are discrepancies within the shops, we do not know which is normal and which is the outlier. With three shops, we now have potentially two shops to point to the outlier (MPSD 33%). The MPSD continues to decline with each additional shop.
As this graph illustrates, maximum possible shop distortion begins to flatten out relative to the incremental program cost as we approach 3 to 4 shops per store. This is where ROI in terms of improved reliability is maximized.