Most mystery shopping programs score shops according to some scoring methodology to distill the mystery shop results down into a single number.
Scoring methodologies vary, but the most common methodology is to assign points earned for each behavior measured and divide the total points earned by the total points possible, yielding a percent of points earned relative to points possible. It is a best practice in mystery shopping to calculate the score for each business unit independently (employee, store, region, division, corporate).
Not all Behaviors are Equal
Some behaviors are more important than others. As a result, best in class mystery shop programs weight behaviors by assigning more points possible to those deemed more important. Best practices in mystery shop weighting begin by assigning weights according to management standards (behaviors deemed more important, such as certain sales or customer education behaviors), or according to their importance to a desired outcome such as purchase intent or loyalty. Service behaviors with stronger relationships to the desired outcome, identified through Key Driver Analysis, receive stronger weight. Again, see the subsequent discussion of Key Driver Analysis.
Don’t Average Averages!
It is a mistake to calculate business unit scores by averaging unit scores together (such as calculating a region’s score by averaging the individual stores or even shop scores for the region). This will only yield a mathematically correct score if all shops have exactly the same points possible, and if all business units have exactly the same number of shops. However, if the shop has any skip logic, where some questions are only answered if specific conditions exist, different shops will have different points possible, and it is a mistake to average them together. Averaging them together gives shops with skipped questions disproportionate weight. Rather, points earned should be divided by points possible for each business unit independently. Just remember – don’t average averages!
What Is A Good Score?
This is perhaps the most common question asked by mystery shop clients – one for which there is no simple answer. It amazes me how many mystery shop providers I’ve heard pull a number out of the air, say 90%, and quote that as the benchmark with no thought given to the context of the question. The fact of the matter is much more complex. Context is key. What constitutes a good score varies dramatically from client-to-client, program-to-program based on the specifics of the evaluation. One program may be an easy evaluation, measuring easy behaviors, where a score must be near perfect to be considered “good” – others may be difficult evaluations measuring more difficult behaviors, in this case a good score will be well below perfect. The best practice in determining what constitutes a good mystery shop score is to consider the distribution of your shop scores as a whole, determine the percentile rank of each shop (the proportion of shops that fall below a given score), and set an appropriate cut off point. For example, if management decides the 60th percentile is an appropriate standard (6 out of 10 shops are below it), and a shop score of 86% is in the 60th percentile, then a shop score of 86% is a “good” shop score.
Work Toward a Distribution
When all is said and done, the product of a best in class mystery shop scoring methodology will produce a distribution of shop scores, particularly on the low end of the distribution. Mystery shop programs with tight distributions around the average shop score offer little opportunity to identify areas for improvement. All the shops end up being very similar to each other, making it difficult to identify problem areas and improve employee behaviors. Distributions with scores skewed to the low end, make it much easier to identify poor shops and offer opportunities for improvement via employee coaching. If questionnaire design and scoring create scores with tight distributions, consider a redesign.