Mystery shop programs measure human interactions; interactions with other humans and increasingly human interactions with automated machines. Given that humans are on one or both sides of the equation, it is not surprising that variation in the customer experience exists.
When designing a mystery shop program, a central decision is the number of shops to deploy. This decision is dependent on a number of issues including: desired reliability, number of customer interactions, and the budgetary resources available for the program. However, one additional and very important consideration, which frankly doesn’t get much attention, is the amount of variation expected in the customer experience to be measured.
The level of variation in the customer experience is an important consideration. Consistent customer experience processes require less mystery shops than those with a high degree of variation. To illustrate this, consider the following:
Assume a customer experience process is 100% consistent with zero variation from experience to experience. Such a process would require only one shop to accurately describe the experience as a whole. Now, consider a customer experience process with an infinite level of variation in the experience. Such a process would require far more than one shop. In fact, assuming an infinite level of variation, 400 shops would be required to achieve a margin of error of plus or minus five percent.
Obviously, the variation of most customer experience processes reside somewhere between perfect consistency and infinite variation. So how do managers determine the level of variation in their process? The answer to this question will probably be more qualitative than quantitative. Ask yourself:
- Do you have a set of standardized customer experience expectations?
- Are these expectations clearly communicated to employees?
- Other than mystery shopping, do you have any processes in place to monitor the customer experience? If so, are the results of these monitoring tools consistent from month-to-month or quarter-to-quarter?
To make it easy, I always ask new clients to give a qualitative estimate of the level of variation in their customer experience from: high, medium to low. The answer to this question will also be considered along with the level of statistical reliability desired and budgetary resources available for the program in determining the appropriate number of shops.
So – ask yourself; how much variation can we expect in our customer experience?
Most mystery shopping programs score shops according to some scoring methodology to distill the mystery shop results down into a single number. Scoring methodologies vary, but the most common methodology is to assign points earned for each behavior measured and divide the total points earned by the total points possible, yielding a percentage of points earned relative to points possible.
Drive Desired Behaviors
Some behaviors are more important than others. As a result, best in class mystery shop programs weight behaviors by assigning more points possible to those deemed more important. Best practices in mystery shop weighting begin by assigning weights according to management standards (behaviors deemed more important, such as certain sales or customer education behaviors), or according to their importance to their relationship to a desired outcome such as purchase intent or loyalty. Service behaviors with stronger relationships to the desired outcome receive stronger weight.
One tool to identify behavioral relationships to desired outcomes is Key Driver Analysis. See the attached post for a discussion of Key Driver Analysis.
Don’t Average Averages
It is a best practice in mystery shopping to calculate the score for each business unit independently (employee, store, region, division, corporate), rather than averaging business unit scores together (such as calculating a region’s score by averaging the individual stores or even shop scores for the region). Averaging averages will only yield a mathematically correct score if all shops have exactly the same points possible, and if all business units have exactly the same number of shops. However, if the shop has any skip logic, where some questions are only answered if specific conditions exist, different shops will have different points possible, and it is a mistake to average them together. Averaging them together gives shops with skipped questions disproportionate weight. Rather, points earned should be divided by points possible for each business unit independently. Just remember – don’t average averages!
Work Toward a Distribution of Shops
When all is said and done, the product of a best in class mystery shop scoring methodology will produce a distribution of shop scores, particularly on the low end of the distribution.
Mystery shop programs with tight distributions around the average shop score offer little opportunity to identify areas for improvement. All the shops end up being very similar to each other, making it difficult to identify problem areas and improve employee behaviors. Distributions with scores skewed to the low end, make it much easier to identify poor shops and offer opportunities for improvement via employee coaching. If questionnaire design and scoring create scores with tight distributions, consider a redesign.
Most mystery shopping programs score shops according to some scoring methodology. In designing a mystery shop score methodology best in class programs focus on driving desired behaviors, do not average averages and work toward a distribution of shops.
Best in class mystery shop programs provide managers a means of applying coaching, training, incentives, and other motivational tools directly on the sales and service behaviors that matter most in terms of driving the desired customer experience outcome. One tool to identify which sales and service behaviors are most important is Key Driver Analysis.
Key Driver Analysis determines the relationship between specific behaviors and a desired outcome. For most brands and industries, the desired outcomes are purchase intent or return intent (customer loyalty). This analytical tool helps mangers identify and reinforce sales and service behaviors which drive sales or loyalty – behaviors that matter.
As with all research, it is a best practice to anticipate the analysis when designing a mystery shop program. In anticipating the analytical needs of Key Driver Analysis identify what specific desired outcome you want from the customer as a result of the experience.
- Do you want the customer to purchase something?
- Do you want them return for another purchase?
The answer to these questions will anticipate the analysis and build in mechanisms for Key Driver Analysis to identify which behaviors are more important in driving this desired outcome – which behaviors matter most.
Next, ask shoppers if they had been an actual customer, how the experience influenced their return intent. Group shops by positive and negative return intent to identify how mystery shops with positive return intent differ from those with negative. This yields a ranking of the importance of each behavior by the strength of its relationship to return intent.
Additionally, pair the return intent rating with a follow-up question asking, why the shopper rated their return intent as they did. The responses to this question should be grouped and classified into similar themes, and grouped by the return intent rating described above. The result of this analysis produces a qualitative determination of what sales and service practices drive return intent.
Finally, Key Driver Analysis produces a means to identify which behaviors have the highest potential for return on investment in terms of driving return intent. This is achieved by comparing the importance of each behavior (as defined above) and its performance (the frequency in which it is observed). Mapping this comparison in a quadrant chart, provides a means for identifying behaviors with relatively high importance and low performance – behaviors which will yield the highest potential for return on investment in terms of driving return intent.
Behaviors with the highest potential for return on investment can then be inserted into a feedback loop into the mystery shop scoring methodology by informing decisions with respect to weighting specific mystery shop questions, assigning more weight to behaviors with the highest potential for return on investment.
Employing Key Driver Analysis gives managers a means of focusing training, coaching, incentives, and other motivational tools directly on the sales and service behaviors that will produce the largest return on investment. See the attached post for further discussion of mystery shop scoring.
Obtain Buy-In From the Front-Line
When mystery shopping initiatives fail to meet their potential, it is often because the people who are accountable for the results — front-line employees, supervisors, store managers, and regional managers — were never properly introduced to the program. As a result, there may be internal resistance, creating an unnecessary distraction from the achievement of the company’s service improvement goals. A mystery shopping best practice is to ensure employees throughout the organization are fully informed and have bought into the mystery shopping program before it is launched. Pre-launch efforts should include: the specific behaviors expected of customer facing employees, a copy of the mystery shop questionnaire, training on how to read mystery shopping reports, how to use the information effectively, and how to set goals for improvement.
Provide Adequate Internal Administration
A best practice in mystery shop program design is to anticipate the amount of administration necessary to run a successful mystery shopping program. It requires a strong administrator to keep the company focused and engaged, and to make sure that recalcitrant field managers are not able to undermine the program before it stabilizes and begins to realize its potential value.
Provide a Fair & Firm Dispute Process
Disputed shops are part of the process. Mystery shops are just a snap shot in time, measuring complex service interactions. As a result, there may be extenuating circumstances that need to be addressed, or questions about the quality of the mystery shopper’s performance that require both a fair and firm process to dispute shop scores. Fairness is critical to employee buy-in and morale. Firmness is required to keep the number of shop disputes in check, and cut down on frivolous score disputes.
The specifics of the dispute process will depend on each brand’s culture and values. Here are some ways a fair and firm best in class mystery shop dispute process can be designed:
Arbitration: Most brands have a program manager or group of program managers acting as an arbitrator of disputes and ordering reshops or adjusting points to an individual shop as they see fit. The arbiter of disputes must be both fair and firm, otherwise, employees and other managers will quickly start gaming the system, bogging the process down with frivolous disputes.
Fixed Number of Challenges: Other brands give each business unit (or store) a fixed number of challenges in which they can ask for an additional shop. Managers responsible for that business unit can request a reshop for any reason. However, when the fixed number of disputes is exhausted they lose the ability to request a reshop. This approach is fair (each business unit has the same number of disputes), it reduces the administrative burden on a centralized arbiter, and reduces the potential for massive gaming of the system as there is a limited number of disputes.
Call to Action Analysis
A best practice in mystery shop design is to build in call to action elements designed to identify key sales and service behaviors which correlate to a desired customer experience outcome. This Key Driver Analysis determines the relationship between specific behaviors and a desired outcome. For most brands and industries, the desired outcomes are purchase intent or return intent (customer loyalty). This approach helps brands identify and reinforce sales and service behaviors which drive sales or loyalty – behaviors that matter.
Earlier we suggested anticipating the analysis in questionnaire design in a mystery shop best practice. Here is how the three main design elements discussed provide input into call to action analysis.
Shoppers are asked if they had been an actual customer, how the experience influenced their return intent. Cross-tabulating positive and negative return intent will identify how the responses of mystery shoppers who reported a positive influence on return intent vary from those who reported a negative influence. This yields a ranking of the importance of each behavior by the strength of its relationship to return intent.
In addition, paired with this rating is a follow-up question asking, why the shopper rated their return intent as they did. The responses to this question are grouped and classified into similar themes, and cross-tabulated by the return intent rating described above. The result of this analysis produces a qualitative determination of what sales and service practices drive return intent.
The final step in the analysis is identifying which behaviors have the highest potential for ROI in terms of driving return intent. This is achieved by comparing the importance of each behavior (as defined above) and its performance (the frequency in which it is observed). Mapping this comparison in a quadrant chart, like the one to the below, provides a means for identifying behaviors with relatively high importance and low performance, which will yield the highest potential for ROI in terms of driving return intent.
This analysis helps brands focus training, coaching, incentives, and other motivational tools directly on the sales and service behaviors that will produce the largest return on investment – behaviors that matter.
Part of Balanced Scorecard
A best practice in mystery shopping is to integrate customer experience metrics from both sides of the brand-customer interface as part of an incentive plan. The exact nature of the compensation plan should depend on broader company culture and objectives. In our experience, a best practice is a balanced score card approach which incorporates customer experience metrics along with financial, internal business processes (cycle time, productivity, employee satisfaction, etc.), as well as innovation and learning metrics.
Within these four broad categories of measurement, Kinēsis recommends managers select the specific metrics (such as ROI, mystery shop scores, customer satisfaction, and cycle time), which will best measure performance relative to company goals. Discipline should be used, however. Too many can be difficult to absorb. Rather, a few metrics of key significance to the organization should be collected and tracked in a balanced score card.
Best in class mystery shop programs identify employees in need of coaching. Event-triggered reports should identify employees who failed to perform targeted behaviors. For example, if it is important for a brand to track cross- and up-selling attempts in a mystery shop, a Coaching Report should be designed to flag any employees who failed to cross- or up-sell. Managers simply consult this report to identify which employees are in need of coaching with respect to these key behaviors – behaviors that matter.