Let’s consider the following problem:
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
1. Linda is a bank teller.
2. Linda is a bank teller and is active in the feminist movement.
If you’re like me, you picked the second option. The majority of people pick the second option. And we’re all wrong, at least from the standpoint of probability and statistics. We’ve all be suckered by the Conjunctive Fallacy.
The probability of two or more events happening together can never be more than the probability of either event happening on its own. So, the probability that Linda is a bank teller and a feminist has to be less than the probability that she’s just a bank teller. We should have picked option one.
The Linda problem is cited as a classic example of cognitive bias. Most behavioural economists explain behavioural biases using Dual Process Theory. The theory proposes that we use two systems to make decisions – system one and system two.
The Thinking behind System One and Two
System one is fast, instinctive, spontaneous and largely subconscious. System two is slow, deliberative and calculating.
We get the Linda problem wrong because we follow our gut instincts (system one) and pay too much attention to Linda’s brief biography. It’s exactly what we’d expect of a feminist. Hence, we fall prey to the Representativeness Heuristic and quickly pick option two.
Instead, what we should have done is used our system two to think about the problem a little more. We would then have realised that the question involved the rules of probability and that we should ignore Linda’s biography.
But this explanation of the Linda problem has never made sense to me. Humans value consistency, which is why related beliefs and actions tend to cluster together. For example, it’s unlikely that a volunteer on the Sea Shepherd believes that coal-powered electricity is a good idea. The two opinions are contradictory when it comes to protecting the environment. That’s why I find it so hard to ignore Linda’s biographical information.
Apparently, I’m not the only person to find this explanation less-than satisfactory. Here’s what Gerd Gigerenzer had to say about system one and system two in an interview with the Harvard Business Review:
What is system one and system two? It’s a list of dichotomies. Heuristic versus calculated rationality, unconscious versus conscious, error-prone versus always right, and so on. Usually, science starts with these vague dichotomies and works out a precise model. This is the only case I know where one progresses in the other direction. We have had, and still have, precise models of heuristics, like one over N. And at the same time, we have precise models for so-called rational decision making, which are quite different: Bayesian, Neyman-Pearson, and so on. What the system one, system two story does, it lumps all of these things into two black boxes, and it’s happy just saying it’s system one, it’s system two. It can predict nothing. It can explain after the fact almost everything. I do not consider this progress.
The alignment of heuristic and unconscious is not true. Every heuristic can be used consciously or unconsciously. The alignment between heuristic and error-prone is also not true. So, what we need is to go back to precise models and ask ourselves, when is one over N a good idea, and when not[1]? System one, system two doesn’t even ask this. It assumes that heuristics are always bad, or always second best.
Unsurprisingly, Gigerenzer also struggles with the system one and system two explanation of the Linda effect. His critique of the model centre on two key issues:
• Narrow norms
• Black box descriptions
Narrow Norms
As Gigerenzer points out, the system one and system two model assumes that system two is a) rational (i.e. follows the rules of probability) and b) always correct. This is a narrow norm.
In other words, there’s only one answer to the Linda problem: the conjunctive probability rule, which is identified using system two. Any other answer is wrong and is attributable to system one.
Gigerenzer raises three issues with this interpretation. Firstly, most people will agree that it’s logical to use probability to model well-defined, repeated events. But that doesn’t mean that they will use probability to estimate a single-event probability, such as the Linda problem.
The Linda problem creates a context (the biographical description of Linda) that makes it reasonable not to conform to the conjunction rule (i.e. option two is consistent with Linda’s past behaviour even if it’s not consistent with the rules of probability).
Secondly, the narrow norm (conjunctive probability) ignores the content of the situation. A purely statistical answer to the Linda problem focuses only on the words probable and and. That’s all that’s needed to answer the problem using conjunctive probability. In contrast, Gigerenzer[2] asserts that:
Content-blind norms are appropriate for textbook problems in probability theory, where the context is only decorative, but they are not appropriate either for evaluating human judgement or as a research tool to uncover the underlying process.
Thirdly, there’s also the possibility that most people’s understanding of the terms probable and and differs from their statistical or logical meaning. For example, here are the synonyms of the word probable listed in the Oxford Dictionary:
likely, most likely, odds-on, expected, to be expected, anticipated, predictable, foreseeable, ten to one, presumed, potential, credible, quite possible, possible, feasible
Several of these synonyms (highlighted in bold) also fit the interpretation that Linda is a bank teller and a feminist, given the biographical information provided. It’s certainly possible, feasible or credible that Linda is a feminist given her background.
When you think of the word and, is the description: “a Boolean operator which gives the value one if and only if all the operands are one, and otherwise has a value of zero” the first thing that comes to mind?
Gigerenzer is making the point that what most people consider to be rational thinking isn’t always the same thing as statistical thinking. If that’s the case, we can’t simply categorise every decision that’s not based on statistical reasoning as biased, irrational or lazy and lump it into system one.
Black Box Descriptions
As Gigerenzer points out, dual process theory categorises but doesn’t explain. Why are there two systems? What are the cognitive processes that underlie system one and system two? Are the two systems supported by our understanding of neuroscience? What triggers us to use system one instead of system two? Does our predisposition to use system one vs. system two change depending on how information is presented?
Gigerenzer asserts that there are currently few satisfying answers to these questions[3]:
The heuristics in the heuristics-and-biases program are too vague to count as explanations. They are labels with the virtue of Rorschach inkblots: A researcher can read into them what he or she wishes. The reluctance to specify precise and falsifiable process models, to clarify the antecedent conditions that elicit various heuristics, and to work out the relationship between heuristics have been repeatedly pointed out…
… I am concerned with understanding the processes and do not believe that counting studies in which people do or do not conform to norms leads to much. If one knows the process, one can design any number of studies wherein people will or will not do well.
A useful model has to do more than simply categorize. It has to make falsifiable predictions, otherwise it can’t be rejected or used as the basis of theory.
Concluding that the Linda problem is an example of system one defaulting to the representativeness heuristic (because most people don’t apply probability theory to solve the problem) leaves a lot unanswered.
Conclusion
My interpretation of Gigarenzer’s critique of system one and system two is that our understanding of decision-making should be nuanced. There are times where keeping it quick and simple makes sense. At other times, we should go to the effort of applying probability and statistics. Simply assuming that any departure from statistical reasoning is irrational obscures this nuance. And it doesn’t help us get any closer to understanding why people make decisions the way that they do.
Sounds like we need a framework to help us figure out when to use heuristics and when to use a more complex model. This is the topic of the fourth and final post in this series.
[1] The previous post considers the 1/N heuristic is in detail.
[2]Gigerenzer, G., On Narrow Norms and Vague Heuristics: A Reply to Kahneman and Tversky (1996): http://library.mpib-berlin.mpg.de/ft/gg/gg_on%20narrow_1996.pdf
[3] Gigerenzer, G., On Narrow Norms and Vague Heuristics: A Reply to Kahneman and Tversky (1996): http://library.mpib-berlin.mpg.de/ft/gg/gg_on%20narrow_1996.pdf
__________
[i3] Insights is the official educational bulletin of the Investment Innovation Institute [i3]. It covers major trends and innovations in institutional investing, providing independent and thought-provoking content about pension funds, insurance companies and sovereign wealth funds across the globe.