Improve Your Estimations with the Equivalent Bet Test
“The illusion that we understand the past fosters overconfidence in our ability to predict the future.”
― Daniel Kahneman, Thinking Fast and Slow
A friend recently asked me to teach him the basics of estimating values for use in a risk analysis. I described the fundamentals in a previous blog post, covering Doug Hubbard’s Measurement Challenge, but to quickly recap: estimates are best provided in the form of ranges to articulate uncertainty about the measurement. Think of the range as wrapping an estimate in error bars. An essential second step is asking the estimator their confidence that the true value falls into their range, also known as a confidence interval.
Back to my friend: after a quick primer, I asked him to estimate the length of a Chevy Suburban, with a 90% confidence interval. If the true length, which is easily Googleable, is within his range, I’d buy him lunch. He grinned at me and said, “Ok, Tony – the length of a Chevy Suburban is between 1 foot and 50 feet. Now buy me a burrito.” Besides the obvious error I made in choosing the wrong incentives, I didn't believe the estimate he gave me reflected his best estimate. A 90% confidence interval, in this context, means the estimator is wrong 10% of the time, in the long run. His confidence interval is more like 99.99999%. With a range as impossibly absurd as that, he is virtually never wrong.
I challenged him to give a better estimate – one that truly reflected a 90% confidence interval, but with a free burrito in the balance, he wasn’t budging.
If only there were a way for me to test his estimate. Is there a way to ensure the estimator isn’t providing impossibly large ranges to ensure they are always right? Conversely, can I also test for ranges that are too narrow? Enter the Equivalent Bet Test.
The Equivalent Bet Test
Readers of Hubbard’s How to Measure Anything series or Jones and Freund’s Measuring and Managing Risk: A FAIR Approach are familiar with the Equivalent Bet Test. The Equivalent Bet Test is a mental aid that helps experts give better estimates in a variety of applications, including risk analysis. It’s just one of several tools in a risk analyst’s toolbox to ensure subject matter experts are controlling for the overconfidence effect. Being overconfident when giving estimates means one’s estimates are wrong more often than they think they are. The inverse is also observed, but not as common: underconfidence means one’s estimates are right more often than the individual thinks they are. Controlling for these effects, or cognitive biases is called calibration. An estimator is calibrated when they routinely give estimates with a 90% confidence interval, and in the long run, they are correct 90% of the time.
Under and overconfident experts can significantly impact the accuracy of a risk analysis. Therefore, risk analysts must use elicitation aids such as calibration quizzes, constant feedback on the accuracy of prior estimates and offering equivalent bets, all of which get the estimator closer to calibration.
The technique was developed by decision science pioneers Carl Spetzler and Carl-Axel Von Holstein and introduced in their seminal 1975 paper Probability Encoding in Decision Analysis. Spetzler and Von Holstein called this technique the Probability Wheel. The Probability Wheel, along with the Interval Technique and the Equivalent Urn Test, are some of several methods of validating probability estimates from experts described in their paper.
Doug Hubbard re-introduced the technique in his 2007 book How to Measure Anything as the Equivalent Bet Test and is one of the easiest to use tools a risk analyst has to test for the under and overconfidence biases in their experts. It’s best used as a teaching aid and requires a little bit of setup but serves as an invaluable exercise to get estimators closer to their stated confidence interval. After estimators learn this game and why it is so effective, they can play it in their head when giving an estimate.
How to Play
First, set up the game by placing down house money. The exact denomination doesn’t matter, as long as it's enough money that someone would want to win or lose. For this example, we are going to play with $20. The facilitator also needs a specially constructed game wheel, seen in Figure 1. The game wheel is the exact opposite of what one would see on The Price is Right: there’s a 90% chance of winning, and only a 10% chance of losing. I made an Equivalent Bet Test game wheel – and it spins! It's freely available for download here.
Here are the game mechanics:
The estimator places $20 down to play the game; the house also places down $20
The facilitator asks the estimator to provide an estimate in the form of a range of numbers, with a 90% confidence interval (the estimator is 90% confident that the true number falls somewhere in the range.)
Now, the facilitator presents a twist! Which game would you like to play?
Game 1: Stick with the estimate. If the true answer falls within the range provided, you win the house’s $20.
Game 2: Spin the wheel. 90% of the wheel is colored blue. Land in blue, win $20.
Present a third option: Ambivalence; the estimator recognizes that both games have an equal chance of winning $20; therefore, there is no preference.
Which game the estimator chooses reveals much about how confident they are about the given estimate. The idea behind the equivalent bet test is to test whether or not one is truly 90% confident about the estimation.
If Game One is chosen, the estimator believes the estimation has a higher chance of winning. This means the estimator is more than 90% confident; the ranges are too wide.
If Game Two is chosen, the estimator believes the wheel has a greater chance of winning – the estimator is less than 90% confident. This means the ranges are too tight.
The perfect balance would be that the estimator doesn’t care which game they play. Each has an equal chance of winning, in the estimators' mind; therefore, both games have a 90% chance of winning.
Why it Works
Insight into why this game helps the estimator achieve calibration can be had by looking at the world of bookmakers. Bookmakers are people who set odds and place bets on sporting and other events as a profession. Recall that calibration, in this context, is a measurement of the validity of one's probability assessment. For example: if an expert gives estimates on the probability of different types of cyber-attacks occurring with a 90% confidence interval, that individual would be considered calibrated if – in the long run -- 90% of the forecasts are accurate. (For a great overview of calibration, see the paper Calibration of Probabilities: The State of the Art to 1980 written by Lichtenstein, Fischhoff and Phillips). Study after study shows that humans are not good estimators of probabilities, and most are overconfident in their estimates. (See footnotes at the end for a partial list).
When bookmakers make a bad forecast, they lose something – money. Sometimes, they lose a lot of money. If they make enough bad forecasts, in the long run, they are out of business, or even worse. This is the secret sauce – bookmakers receive constant, consistent feedback on the quality of their prior forecasts and have a built-in incentive, money, to improve continually. Bookmakers wait a few days to learn the outcome of a horserace and adjust accordingly. Cyber risk managers are missing this feedback loop – data breach and other incident forecasts are years or decades in the future. Compounding the problem, horserace forecasts are binary: win or lose, within a fixed timeframe. Cyber risk forecasts are not. The timeline is not fixed; “winning” and “losing” are shades of grey and dependent on other influencing factors, like detection capabilities.
It turns out we can simulate the losses a bookmaker experiences with games, like the Equivalent Bet test, urn problems and general trivia questions designed to gauge calibration. These games trigger loss aversion in our minds and, with feedback and consistent practice, our probability estimates will improve. When we go back to real life and make cyber forecasts, those skills carry forward.
Integrating the Equivalent Bet Test into your Risk Program
I’ve found that it’s most effective to present the Equivalent Bet Test as a training aid when teaching people the basics of estimation. I explain the game, rules and the outcomes: placing money down to play, asking for an estimate, offering a choice between games, spinning a real wheel and the big finale – what the estimator’s choice of game reveals about their cognitive biases.
Estimators need to ask themselves this simple question each time they make an estimate: “If my own money was at stake, which bet would I take: my estimate that has a 90% chance of being right, or take a spin on a wheel in which there's a 90% chance of winning." Critically think about each of the choices, then adjust the range on the estimate until the estimator is truly ambivalent about the two choices. At this point, in the estimator’s mind, both games have an equal chance of winning or losing.
Hopefully, this gives risk analysts one more tool in their toolbox for improving estimations with eliciting subject matter experts. Combined with other aids, such as calibration quizzes, the Equivalent Bet Test can measurably improve the quality of risk forecasts.
Resources
Downloads
Equivalent Bet Test Game Wheel PowerPoint file
Further Reading
Calibration, Estimation and Cognitive Biases
Calibration of Probabilities: The State of the Art to 1980 by Lichtenstein, Fischhoff and Phillips
How to Measure Anything by Douglas Hubbard (section 2)
Thinking Fast and Slow by Daniel Kahneman (part 3)
Probability Encoding in Decision Analysis by Carl S. Spetzler and Carl-Axel S. Staël Von Holstein
Why bookmakers are well-calibrated
On the Efficiency and Equity of Betting Markets by Jack Dowie
The Oxford Handbook of the Economics of Gambling, edited by Leighton Vaughan-Williams and Donald S. Siegel (the whole book is interesting, but the “Motivation, Behavior and Decision-Making in Betting Markets” section covers research in this area)
Conditional distribution analyses of probabilistic forecasts by J. Frank Yates and Shawn P. Curley
An empirical study of the impact of complexity on participation in horserace betting by Johnnie E.V. Johnson and Alistair C. Bruce