Guarding Against Bias When Training Language Models
Machine learning models return biased results when the datasets used to train them contain bias. Instances of social bias, skewed model results, and outputs that don’t represent the full scope of a business problem for a specific domain are some of the caveats when employing this technology. Language models, which are one of the most popular forms of advanced machine learning, are no different in this respect.
To counter the effects of bias, the shrewd enterprise must do more than guard against it when training language models. Users must also minimize bias when fine-tuning models—which frequently entails adjusting the weights and parameters of open source models with an organization’s own data.
Such models include Meta’s Llama iterations and those found on frameworks such as Hugging Face. The most common way organizations encounter machine learning bias is from utilizing large language models (LLMs) such as ChatGPT or Claude. The parameters for these models cannot be adjusted. Moreover, there’s no way of ascertaining exactly what data was used to train them. “It’s a black box,” commented Mary Osborne, SAS senior product manager of natural language processing. “We have no insight what went in there, which is part of what makes these models so difficult to mitigate.”
When training or fine-tuning models or employing LLMs, users can only minimize—not eliminate—biased model outputs.
According to Laserfiche CTO Michael Allen, “Research has shown, and you can do the research, that very prominent models do exhibit bias. There’s certainly bias because the data being fed into LLMs is on the internet, and that data isn’t fully representative of the world these businesses are operating in.”
Consequently, organizations must discern the types of bias affecting language models, assess how it might impact their own use cases, and employ measures to negate it.
PRETRAINED MODELS
Bias may be easiest to counteract when organizations build language models from scratch. With this approach, they have full control over the datasets models are trained on and can adjust them appropriately to minimize bias. This way “You can do your due diligence to understand where bias lies,” Osborne said. However, there are few tasks that are more arduous, resource-intensive, or time-consuming than building and training a language model from scratch. Most organizations don’t have teams of data scientists for such an undertaking, which is why they rarely invest in building and training models but rely on pretrained models.
Nevertheless, Osborne commented, “The lack of transparency as to what those models were trained on creates headaches when trying to mitigate things you don’t fully understand.”
What organizations can control, however, is the datasets used to fine-tune the model, which provide opportunities to assuage existing bias while making a pretrained model domain-specific.
TYPES OF BIAS
There are two chief types of bias in language models: social bias and bias arising from underrepresented aspects of a business problem. Social bias pertains to societal factors or classifications such as gender, ethnicity, and age. With this type of bias, models produce outputs that tend to favor one age group while precluding others. Models may also exhibit the tendency to generate more male names than female names. The second type of bias stems from disproportionate amounts of training data to fully represent a business problem.
Perhaps there’s a surplus of training data about one brand of mobile communication devices and a lack of others. “We can see bias cropping up in discussing technologies, products, countries,” Allen said. “Anything with a political dimension to it, anything with a competitive dimension to it, there may be a bias there.”
The more pressing concern for enterprise users of language models is this latter type of bias, in which models may not be exposed to all of the data needed to fully understand a business problem. This consideration is paramount when attempting to make models more domain-specific. “That’s coming from not doing enough homework upfront when we expose the model to data we have control over, or representative artifacts that exist in the pretrained data that the large language model has seen,” Osborn explained.