Tackling Explainability and Interpretability in Language Models

The very nature of language models—large language models (LLMs), foundation models, and frontier models—confounds the notions of explainability and transparency. Even smaller open source language models have too many parameters, hyper-parameters, weights, and features for data scientists to consistently identify which one is responsible for a particular output.

Imagine, then, the difficulty of a modestly sized enterprise attempting to do the same thing, yet without data scientists, a large IT department, or the wherewithal to finetune (let alone train) their own models.

Consequently, organizations are in a precarious position. They have not trained language models, know little about their inner workings, and yet are still expected to be able to explain their results—especially when they pertain to mission-critical business processes, customers, and regulations.

“No one thing is going to be able to solve this problem,” admitted Scott Stephenson, Deepgram CEO. “And, I don’t think if you did everything that’s available today, you could solve this problem. This is still an active area of research and development. No one solution or school of thought has it figured out.”

However, by availing themselves of a number of different approaches, organizations can drastically increase transparency for understanding why language models produce specific outputs. The most prominent of these techniques involves context engineering, thinking or reasoning models, chain of thought, retrieval-augmented generation (RAG), and multi-agent architectures.

Nevertheless, the task at hand is far from trivial, particularly for those employing models from managed service providers, which is what most organizations are doing.

“Not only do you have the black box of how does the model function, but you also have the black box of whatever this company has wrapped the LLM in,” noted Blair Sammons, director of solutions, cloud and AI at Promevo.

EXPLAINABILITY OR ACCOUNTABILITY?

The need to explain the outputs of probabilistic machine learning models has long been integral to statistical AI adoption rates. When deep learning and deep neural networks were the vanguard of this technology, explainability and interpretability were facilitated in several ways. Techniques like Local Interpretable Model-agnostic Explanation (LIME), Shapley values, Individual Conditional Expectation (ICE), and others enabled users to understand the numeric significance of model outputs, which bolstered explanations for them. According to Ebrahim Alareqi, principal machine learning engineer at Incorta, “With LIME and Shapley values, they were good at highlighting which features ended up being used in a limited number of parameters. As we got to LLMs, it’s a whole different thing because now we’ve moved beyond a billion parameters.”

The mere size and intricacy of LLMs have resulted in an evolution of what explainable AI has come to mean. The term is transitioning from one in which each output of a model can be attributed to a specific weight, parameter, or feature to one in which there is transparency about why a model responded as it did. The importance of explainability, then, is rapidly being replaced by more pragmatic notions of rectitude and traceability. “If the model always does the right thing, do you really care how it did it?” Stephenson posited. “You really care when it does the wrong thing. That’s why you open it up and see why it did something. You reduce the need for explainability the higher the quality of the actions of the model are.”

REASONING MODELS

The shift in priorities from explainability to transparent accountability of language models is typified by reasoning models, also known as thinking models, which Alareqi described as a modern means of understanding model results. According to Jorge Silva, director of AI and machine learning development at SAS, “A reasoning model is one that not only answers your prompt, but provides the steps in a human-like fashion that it took to get there.” Many believe that when such models, which include Qwen, Open AI’s gpt-oss, and DeepSeek-R1, detail what specific steps were “thought about” or performed to reach a conclusion, this provides concrete traceability. This way, humans can effectively “replicate the steps,” Silva pointed out.

Examples include everything from employing these models to solve quadratic equations to explaining which product is predicted to sell better in the coming quarter—as well as why. These capabilities are predicated on the chain of thought form of prompt augmentation. According to Alareqi, when chain of thought is ingrained within reasoning models, it “is the way to look [at] what is inside the model.” However, these models are far from flawless. “There’s some research that shows reasoning models are more prone to hallucinations,” Silva commented. “Because we’re asking for more information to be returned without giving more inputs, it needs to extrapolate more.”

CHAIN OF THOUGHT

The chain of thought technique that is embedded in reasoning models initially emerged around the same time as the more popular RAG method. Adoption rates may have been compromised by the upfront effort involved. “Before, you would have to write this chain of thought for the model,” Alareqi remarked. “With the newer models, we don’t have to do this because it’s inherent in them.” With organizations required to explain the steps, for example, to predict customer lifetime value or which loans to approve, this methodology required more tokens to send to models, resulting in higher costs.

However, because thinking models have chain of thought embedded within them, there are lower costs for inputs and outputs. According to Stephenson, a caveat for employing these reasoning capabilities to facilitate transparency for model responses is, “You don’t allow the chain of thought to be too general. It needs to be very specific. So, the model doesn’t say, ‘Compared to all the competitors’; it says, ‘Compared to competitor one, and here’s the name, and competitor two, and here’s the name, and competitor one has these qualities, etc.’”

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Friends of Enterprise AI World! Register NOW for London's KMWorld Europe 2026 at the early bird rate. Early bird offer ends March 13.

Tackling Explainability and Interpretability in Language Models

EXPLAINABILITY OR ACCOUNTABILITY?

REASONING MODELS

CHAIN OF THOUGHT

Solving the 95% Blind Spot

Spiral by UJET - The AI Issue Hub for Decision-Grade Data

Tipping the Scales: AI-led or Human-led CX in 2025 and Beyond

AI-Powered Customer Experience

More

The Rise of Autonomous Data: How Agentic AI Is Redefining Data Management

Fostering Trust in AI Systems: Governance for the Age of Intelligent Agents

More Webinars