Although it’s been known in the technical domain for a while, it’s been cropping up a fair bit in mainstream media that AI systems often carry across biases of training sets in to their results. Of course, it is never worded like this. Much more popular is the sensationalised “Is your AI SEXIST???” style of headline. The reasons for this seem to be poorly understood, even in the technical argument; I had a discussion with a person at Black Hat who was of the opinion that the issue is nothing to do with the data, but that it instead stems from white male developers subconsciously programming their computers to be sexist. Really.
AI has huge implications to the world of employee hiring. You have an application open for 5 people and you get 200 applications. ½ of those can easily be discounted because they obviously aren’t qualified, but that still leaves you with 100 to sift through manually. So, you get some of those AIs that you’ve heard so much about. Your system is trained on the CVs of your current employees (who do a pretty great job already), and get it to try and pick out CVs that look like theirs. Suddenly, your AI is picking people based on demographics rather than skills/competency, oh no! Why did this happen?
It’s more to do with how neural networks work in general. They learn based on the data, in the purest sense. Say you wanted to hire more executives to a company, so you give the algorithm all of the existing 10 executives’ CVs. 9 of these executives are male, and 1 is female. There is no reason to want more men instead of women, but that is the data set you have available- 90% are men. So, the system learns that ideal candidate=male, because 90% of the training data says male.
So how do we stop that? Remove any mentions of gender of course! Except it doesn’t work quite like that. We can remove direct mentions to demographics, but people can be grouped by other pieces of metadata such as language used. Women tend to use different words in CVs than men do. 90% of the CVs are written using one style of writing, 10% are in the other. Even though the AI has no concept of these groups, it has learned that the ideal candidate uses a particular style of language, based on training data. The system then biases towards selecting people from this group, choosing men’s CVs over women’s.
How do we address this? There’s no clear cut answer. Because of how we form our training data sets, there are bound to be some biases. The bias doesn’t lie in the AI per se, it is carried in the training data. But neural networks will all react the same way- generalising based on the training data, because that’s what they do. This isn’t something you can write off as “one AI had an error” as it’s an inherent issue when using neural networks this way.
You’re creating a tool designed to form generalisations of data, and then complaining when it applies generalisations to new data.