Associate Professor of Organization & Management Özgecan Koçak is interested in collective decision-making: how groups and organizations aggregate their knowledge of and opinions about a topic and make a decision on all kinds of issues, from vaccines to nuclear power, the formulation of strategy to the future of AI. Disagreements are bound to arise in decision-making contexts. In many cases, these are not rooted in different values but happen because we focus on different facts, different causal assumptions, or predictions we make about the future. Stripping away the noise and heat from debate, says Koçak, can help us pinpoint where disagreements begin, and find better, more targeted ways to settle them. It can also help us understand what we’re really arguing about—and what really matters. She and her colleagues Phanish Puranam of Institut Européen d’Administration des Affaires (INSEAD) and Nghi Truong of Sasin School of Management have built a tool to help decision-makers do just that.

Here Koçak explains chains of reasoning, the tool that parses them, and why it matters.

In your work, you say we use “codes” to construct arguments and that codes also underpin disagreements. What do you mean by that?

When we take a position or a stance, we use reason to reach that position or to justify it to others. Our reasoning uses certain cultural codes that reflect our understanding of the world around us. These break down into three principal categories: communication codes, causal codes, and evaluative codes.

Communication codes are what we use to define things. Take the word “sustainability.” Simple example, but without further clarification, it can refer to long-term projections about a business or sensible use of environmental resources. So there’s an obvious potential misunderstanding right there. Causal codes are what we use to understand how the world works, why things happen the way they do or what the outcome is if you take a certain course of action. So a difference in causal codes might be me believing that a vaccine will protect me from a disease, and someone else believing that it won’t. And evaluative codes are the way we judge actions or outcomes—good, bad; desirable or not. With most types of disagreement or conflict, you can trace the origin of divergence back to differences in these three codes.

Can you share an example?

Let’s say you and I discuss whether to invest in a company. If you prize profitability while I’m fixated on market share, that’s a disagreement based on evaluative codes. Now let’s say we agree on growth being our priority, but we can’t see eye to eye on how to make that growth happen. It may be that I think we need to diversify our product portfolio while you think we should market our existing products more aggressively. Now we have a difference in causal codes. And then maybe to you, growth means geographic expansion, while to me it’s to do with employee size. And this is an example of different communication codes.

So this is about pinpointing the code or the combination of codes that underlie disagreements? Why is this so important?

If you can diagnose the type or combinations of codes that drive disagreement, you can work towards more effective resolution.

In our investment scenario, if you know it’s a communication problem, ask more questions. Ensure you’re on the same page about what growth means. A causal code issue? Invest in research on what drives growth, perhaps using data from your organization’s past or by observing your rivals. If it’s evaluative codes—you value X, I value Y—you will need to resolve differences through bargaining and negotiation.

Take something like the regulation of artificial intelligence (AI). Experts will weigh in on this with a plethora of different opinions citing lots of diverse reasons. If you can parse the reasoning chains they use, you begin to understand why they disagree and precisely what they are disagreeing on. Is it a definition issue? Is it causal—do they disagree on how AI systems develop? Is it evaluative—do they clash on whether task encroachment is a problem? Diagnosing the source disagreement can enable you to facilitate a more productive discussion—whether it’s policymakers commissioning more research or creating space for more debate.

You’ve developed a tool to diagnose debate. How does it work?

Essentially the tool does two things. It defines categories and then it does the actual analysis of texts, separating premises used in chains of reasoning into the right categories.

The definition part is creating what my colleagues and I call a “grammar of premises.” So here we categorize the different premises that people use to build an argument—whether it’s based on communication, causal or evaluative codes: how they define things, how they think about things and what they value. And we add two more categories: facts and forecasts. This looks at how much of their argument is built on what’s happened or is happening, and what they think is going to happen in the future.

Step two is the actual analysis. And here we built a tool that we call PEEL (Premise Extraction using an Ensemble of LLMs) that parses texts to extract and classify the premise of the argument, piece together the reasoning chains the person uses, and then compare arguments between people to identify the root cause of disagreements.

Yes, we took transcripts from the Lex Fridman Podcast where he interviews scores of experts on AI. It’s a pretty rich database with speakers from industry, media, research and business. As a corpus for analysis, it poses a challenge because these are natural conversations that are hours long. In natural communication, people do not present their arguments in a linear way. They might also neglect to make some of their premises explicit, sometimes because they assume the listener shares the same codes.

To trial PEEL, we picked discussions around the topic of AI risk because this is a very important question that ought to be debated publicly. Here, we found a range of very different opinions, especially around the topics of existential risks attached to AI and the risk of labor displacement. So in our analysis of disagreements among guest speakers, we focus on these two topics, using our tool to understand why people holding “optimistic” or “pessimistic” views on these issues disagree.

On the topic of AI and job displacement, PEEL reveals that the heart of the debate, the root cause is essentially tied to the idea of revolution versus evolution. It’s rooted in whether they see AI as a disruption like the steam engine, that engendered more jobs that only human beings could do; or whether they see it as quantitatively different.

The existential risk disagreement is more rooted in causal assumptions about how AI systems behave or might behave. Here you see what we call the “boomer” camp making optimistic forecasts, whereas the “doomer” camp uses more causal arguments to predict that AI could develop goals of its own that don’t align to human goals.

In other words, you use the tool to hone right in on what people are really arguing about?

Yes, it allows us to cut through the noise and isolate the root cause or root issue that matters to people. For instance, knowing that in the case of existential risk, the debate is really about AI developing independent objectives, policymakers can be more vigilant about stress-testing AI’s capabilities and its tendencies, making sure we have early warning indicators to watch. If even the experts disagree on what’s possible here, that’s where we need to be focusing attention, right? 

We hope that the tool will be of use to policymakers who are tasked with regulating the defining problems of our world, and the public who should be placing demands on policymakers to do so.

Where else do you see your tool being deployed?

I see PEEL as an aide to decision-makers in any type of context, scenario or debate. I’m trialing its use in strategy mapping with business students.

There’s potential for it to be applied to strategy documents or annual reports to parse out the most critical organizational goals and different theories on reaching goals to help bring clarity around priorities, how they rank and where disagreement might arise—where do decision-makers disagree, and why? I think there’s potential for PEEL to work as a mediation tool in any setting.

Meanwhile, Defne Apul, an environmental scientist and colleague, is looking to integrate PEEL into an educational tool for high-school students. Students will use available data to propose solutions to water management problems and then use PEEL to think through how various stakeholders might react to their solutions. Again, the idea is to use it as a tool to understand debate rationally and facilitate a route towards constructive resolution.

Constructive resolution sounds attractice in these polarized times.

A tool like this brings neutrality through analytical and deliberative analysis of debate and dissension. This is cold thinking, if you like. Instead of reaching for quick understanding or ceding to the heat of partisan argumentation, this is a tool that sheds cool light and doesn’t add fire to debate.

Unlike social media platforms, which encourage people to react to issues on the basis of emotions, tools like this can support thoughtful deliberation. It can help people form an independent opinion rather than just jump on a bandwagon. My hope is that, by stripping out the core components of debate, and classifying those components to establish root cause, tools like PEEL may help us make some headway into wicked problems—those issues that are so complex, so interconnected and challenging that they feel impossible to solve. The problems that cause most debate, most heat and most noise.

Of course, PEEL isn’t a panacea. But it can help us clarify what the real issues are—what it is we’re really arguing about—to facilitate constructive policy-making, design and debate.

Explore more insights from Goizueta’s faculty and their research shaping business today.