Most artificial intelligence models are built and trained by humans, and therefore have the potential to learn, perpetuate and massively scale the human trainers’ biases. This is the word of warning put forth in two illuminating articles published earlier this year by Jack Clark at Bloomberg and Kate Crawford at The New York Times.
Tl;dr: The AI field lacks diversity — even more spectacularly than most of our software industry. When an AI practitioner builds a data set on which to train his or her algorithm, it is likely that the data set will only represent one worldview: the practitioner’s. The resulting AImodel demonstrates a non-diverse “intelligence” at best, and a biased or even offensive one at worst.
The articles focus on two related areas in which diversity and demographics matter when it comes to building AI: the data scientist, and the data scientist’s choices for training data. Again, the theory is that though it’s subconscious, the practitioner’s selection of training data — say, images of peoples’ eyes or tweets in English — reflect the types of objects, experiences, etc. with which the practitioner is most familiar (perhaps images of a particular demographics’ eyes, or tweets written in British English).
There’s a third area in which demographics and diversity matter, though. It’s just as important, and it’s often overlooked — it’s the annotators.
Many people = many (varying) viewpoints
Data used for training AI and machine learning models must be labeled — or annotated — before it can be fed into the algorithm. For instance, computer vision models need annotations describing the categories to which images belong, the objects within them, the context in which the objects appear and so on.
We need to remain keenly aware of what makes us all, well… human.
Natural language models need annotations that teach the models the sentiment of a tweet, for example, or that a string of words is a question about the status of an online purchase. Before a computer can know or “see” these things itself, it must be shown many confident positive and negative examples (aka ground truth or gold standard data). And you can only get that certainty from the right human annotators.
So what happens when you don’t consider carefully who is annotating the data? What happens when you don’t account for the differing preferences, tendencies and biases among varying humans? We ran a fun experiment to find out.
We need to remain keenly aware of what makes us all, well… human.
No comments:
Post a Comment