Guarding Against Bias When Training Language Models
REINFORCEMENT LEARNING
The next layer of decreasing the impact of bias affecting learning models also involves training data. However, it’s implemented after models have undergone their initial training.
Reinforcement learning is often posited as a third type of machine learning compared to the more popular supervised learning and unsupervised learning techniques. Unlike supervised learning (which relies on labeled training data) and unsupervised learning (which relies on data without labels for techniques such as clustering), there is no training data required for reinforcement learning. Instead, an agent dynamically interacts with an environment and is scored based on its performance. The score influences the model to exhibit certain behaviors, or perform certain actions, to get a higher score.
Organizations can utilize reinforcement learning in addition to generating synthetic data to balance the training datasets for language models. However, Allen mentioned he’s “skeptical those [techniques] will solve the problem.” Instead, he positioned reinforcement learning as an alternative to employing synthetic data techniques. “Reinforcement learning is going to play an increasingly important role in addressing bias that’s been discovered,” Allen opined. “With reinforcement learning, if the model outputs results that we feel are biased, we can score that model down, so it’s essentially losing points in that game.”
Although reinforcement learning doesn’t involve training data, it’s not infrequently employed in the model building process. In some use cases, it’s applied after models have been trained. “LLMs are not just training a large model on a huge amount of data and calling it done,” Allen said. “Reinforcement learning plays an integral role in shaping the model.”
LINGUISTIC TECHNIQUES
One of the final layers for reducing biased outputs in language models entails monitoring model outputs. With this approach, organizations rely on what Osborne termed “linguistic techniques” to determine when models are producing biased results. This layer is applied after the model has been trained, but before the results are disseminated to users.
This method is particularly effective when working with LLMs, since organizations don’t know exactly what datasets—and what bias was in them—were used to train these models. “Instead of trying to control the variables that feel uncontrollable, we can look for things that we deem as unacceptable behavior or model responses, and stop it there before it hits the user,” Osborne explained.
Several AI governance solutions rely on a number of different techniques to provide this same advantage. These out-of-the-box solutions can be readily purchased—although they’re somewhat less applicable to training or fine-tuning models. Osborne recommended employing linguistic models that specialize in various forms of natural language processing, which scrutinizes the various parts of speech and linguistic structure of model outputs (and inputs).
According to Osborne, “Linguistic models are able to look at unstructured data at a nuanced, granular level. This gives us an opportunity to understand what’s happening in the unstructured responses or in the prompts users are generating and make a decent prediction about whether or not that will be biased or toxic.” Organizations can think of these resources as bias-detection models. Certain analytics solutions contain tools for organizations to construct these models (and, ostensibly, their own language models for applying them). These tools are targeted toward self-service users but may require the expertise of accomplished data scientists to reap their full value.
USER BEHAVIOR MONITORING
The last layer for mitigating the effects of bias when deploying language models involves tracking user behavior. With this approach, organizations are tasked with monitoring and discerning the intentions of users to ensure they’re not trying to induce models to produce biased outputs. Again, this layer is less applicable to training or fine-tuning models than it is to employing them or LLMs, since it’s predicated on analyzing user interactions with them.
It may also serve as an alternative to some of the other mitigation measures outlined above, although it’s certainly complementary to them. “Instead of focusing so heavily on inherent bad behaviors that exist in the model, maybe we track inherent bad behaviors that exist on the part of the user,” Osborne said. User behavior monitoring should always be part of the model operations process and one of the final considerations before recalibrating models to ensure they’re working as planned.
NEGATING MODEL BIAS
By analyzing training and fine-tuning datasets for bias, implementing data quality, generating underrepresented domain training data, employing reinforcement learning and bias detection models, and monitoring user behavior, organizations can reduce effects of language model bias.
Unfortunately, they won’t be able to completely eliminate it.