The human brain processes an inordinate amount of information every day. Scientists have tried to quantify it, with some saying it is 100,000 words and others suggesting it is 34GB of data. Regardless of the study, they all agree it is a massive amount. And as an evolved species, humans have found ways to quickly process much of that information without it even registering they are doing it.
This is part of the reason unconscious bias – or the deeply rooted prejudices each of us has – exists. To quickly process and categorize information, our brains make snap judgements that can be skewed or worse, reinforce stereotypes. Gender, age, physical appearance and even names are just a few types of biases that many people hold and they can play out in many ways including words, language and actions.
And in a world where society is becoming more reliant on generative AI tools to provide even more information and data, it is imperative that those creating the tools are cognizant of the biases that could be built into the large language models and do their best to ensure that they are not perpetuating stereotypes or reinforcing social inequalities.
You might be scratching your head, wondering how you can do that when each person’s unconscious bias is unique to them and, by definition, not something they are actively aware of.
Recognizing that it exists is the first step. But then, you should immediately look at your data and data curation processes. A recent global study commissioned by Progress found that 66% of organizations experience data bias today and 78% are concerned data bias will become a bigger issue as AI/ML use increases.
A careful curation of training data is essential to ensure that it represents the diverse user base and use cases that generative AI will encounter. Collecting data from a wide range of sources across demographics, geography and other relevant factors allows you to incorporate diversity into the training data.
Employing a diverse development team will also help you mitigate unconscious bias in your toolset. People of different genders, from different cultures, who speak different languages, who have different identifies and backgrounds will ensure a broader world view.
Automated tools can help detect and mitigate bias in the training data as well. Some companies use machine learning algorithms to identify and remove biased language from text datasets, while others employ statistical techniques to recognize and address disparities in demographic representation. There are open-source alternatives as well. One example is Fairlearn, an open-source community-driven project. Not only can you install the Fairlearn package and assess and mitigate “fairness” issues, the group also provides a number of resources to educate you on how to think about “fairness as sociotechnical.”
And while you should care about mitigating unconscious bias in AI because it is the right thing to do, very soon, it may also be law.
AI has been around for a long time (when the “Logic Theorist” was presented at the Dartmouth Summer Research Project on Artificial Intelligence almost 75 years ago). But until December of 2022 – when OpenAI released a free preview of ChatGPT that received more than a million sign ups in just five days – AI was really more something that folks in technology were interested in. With the power of AI in the hands of every day people the space has exploded. According to a report by Statista, the generative AI market is projected to reach more than $66B in 2024. With the expected growth, more regulation is expected from governments around the world.
In fact, in 2023, President Biden signed an executive order on AI that calls for more transparency and new standards. Additionally, the European Union agreed to the AI Act, which, among many other things, will require transparency in models and will hold organizations accountable for any harm that results from the use of their tools.
The next few years will prove interesting, with governments around the world becoming involved and ultimately shaping much of the direction of the industry. By working to mitigate unconscious bias in your datasets and tools through team structure, data curation, testing and more, you will no doubt find yourself on the right side of the ethical conversation, and also be able to navigate impending regulation more easily.