The Humans in the Machine
Season 7: Episode 2
They’re the essential workers of AI — yet mostly invisible and exploited. Does it have to be this way? Bridget Todd talks to data workers and entrepreneurs pushing for change.
Millions of people work on data used to train AI behind the scenes. Often, they are underpaid and even traumatized by what they see. In this episode: a company charting a different path; a litigator holding big tech accountable; and data workers organizing for better conditions.
Thank you to Foxglove and Superrr for sharing recordings from the the Content Moderators Summit in Nairobi, Kenya in May, 2023.
Richard Mathenge helped establish a union for content moderators after surviving a traumatic experience as a contractor in Kenya training Open AI’s ChatGPT.
Mercy Mutemi is a litigator for digital rights in Kenya who has issued challenges to some of the biggest global tech companies on behalf of hundreds of data workers.
IRL: Online Life is Real Life is an original podcast from Mozilla, the non-profit behind Firefox. In Season 7, host Bridget Todd talks to AI builders that put people ahead of profit.
Krista Pawloski: A lot of people have the idea that AI is this fancy software that just does all this stuff. They don’t realize that it’s thousands and thousands and thousands of people inputting data into the system. Yeah. There’s the little man behind the curtain, you know, it’s not actually a wizard.
Bridget Todd: This is IRL, an original podcast from Mozilla, the non-profit behind Firefox. This season we meet people who are building more trustworthy artificial intelligence by putting people over profit. I’m your host Bridget Todd. And I want to show you a side of AI that’s usually hidden. It involves millions of people handling data that goes into systems like ChatGPT.
Bridget Todd: That person you heard a moment ago was Krista Pawloski. She’s a data worker. Many of us have done a fair bit of online shopping. If you’ve ever talked to a chatbot on a shopping website, it’s possible Krista helped train it to respond. She also labels satellite photos, so the computer vision steering planes and cars can learn how to “see”.
Bridget Todd: The fact that Krista herself is invisible – not just to you and me – but even to the people building AI, is a real problem. You see, AI is credited with adding trillions of dollars to the global economy while millions of people behind the scenes are underpaid and exploited. Does it have to be this way? In this episode, I’ll dig into data workers’ demands that so-called AI be more ethical and fair.
Bridget Todd: First, let’s head to Kenya. It’s May 1st and we’re at a groundbreaking meeting at a hotel in Nairobi. In just a moment, 150 content moderators will vote to create a union. It’s the first union like this in the world.
Nairobi Content Moderator: Do you, here and now, agree to register the content moderators union? If so, raise your hand. [Applause]
Bridget Todd: Content moderators screen and remove gruesome content on social media so we don’t have to see it. Their decisions are then used to train AI systems. Some people at this meeting have also helped finetune OpenAI’s ChatGPT so it doesn’t generate sexually graphic and violent texts.
Nathan Nkunzimana: Here’s what former moderator Nathan Nkunzimana had to say that day.
We are soldiers who work in dark space, whose employer do not want them to be known or seen, withholding nothing to offer our lives. To protect what has been called community, and the community has forgotten about us, and I want to say we are part of that community.
Bridget Todd: We’re going to hear from someone in Nairobi who did sensitive data work for ChatGPT and now is an advocate for the union.
Richard Mathenge: Thank you very much. My name is Richard Mwaura Mathenge. My professional background ranges from administrative work as well as customer service.
Bridget Todd: Richard was contracted late last year by the outsourcing company Sama to work as a team lead on a project he later realized was for ChatGPT. He was optimistic about the job at first. But after a week of training, reality hit.
Richard Mathenge: The content became explicit. It became grotesque. It became obscene, uh, which warranted or provoked myself as the team lead to engage, to approach, the wellness department and my supervisors to inform them that we need psychiatric assistance as soon as possible without any delay.
Bridget Todd: Day in and day out, the team was reading and labeling texts from the bowels of the datasets used to train OpenAI’s large language models. Sexually horrific content that Richard says was traumatizing to read because of how graphic and violent it was, and because they did nothing else for hours on end. But as countless moderators – working for Sama and other outsourcing companies – have reported for years, there was barely any psychological support.
Richard Mathenge: They were very hardworking and very devoted. I saw there was a lack of commitment in trying to ensure that counseling sessions were being rendered to, to the contact moderators. I felt there was a need for me to rise to the occasion and speak out against such activities.
Bridget Todd: Richard says Sama didn’t do enough, and that after the project ended, his team members were jobless and too traumatized to find work. He says lives were shattered, marriages dissolved. Now moderators are calling for change.
Richard Mathenge: The companies that keep employing these individuals, that there needs to be a serious change of mechanisms. So these organizations, such as OpenAI, need to disqualify any form of middlemen so that they can have a direct engagement, so that OpenAI can come to an understanding that this is the challenges that the content moderators are going through.
Bridget Todd: Richard says that the best way to connect workers to AI builders is by not using the outsourcing companies. That means workers could talk to AI builders directly about their safety. Human-to-human. It could also mean better pay and working conditions. Sama charged OpenAI 6-9 times more than what they paid junior workers, who earned less than $2 an hour.
Richard Mathenge: We were actually the firefighters in trying to enable future users to address any issue or to interact safely with the chatbot. It’s my obligation to get to sensitize existing content moderators, not just in Kenya, but also all around the world of the need of speaking together as one entity.
Bridget Todd: Data work is happening all over the world. But most often it involves companies in the Global North outsourcing labor to nations in the Global South. Millions of people in countries like the Philippines, India, and Kenya with low wages and a surplus of tech graduates.
In Kenya, moderators took Big Tech companies to court. They want justice for the many people suffering from post traumatic stress disorder. And they’re urging the Kenyan government to regulate this tech work.
Bridget Todd: So how do you go about challenging companies like Meta and TikTok from Kenya?
Mercy Mutemi: You sue them. That is how you go about it. You sue them.
Bridget Todd: This is Mercy Mutemi. She’s the litigator behind a number of prominent digital rights cases in Kenya. One was challenging Facebook on unfair treatment of content moderators. All in all, Mercy has represented close to 200 data workers in separate cases.
Mercy Mutemi: The picture they’ve tried to portray is that they’re too big to be sued outside of the U.S. or outside of the E.U. And what we’re seeing is not a foreign company that purely operates from the U.S. They have operations here, they did things here that contributed to the violation of human rights. So you don’t get to cause harm to a society here and then say, you can’t sue us. There’s no harm that doesn’t have a remedy. So you just sue them. They can definitely be sued anywhere in the world where they cause harm.
Bridget Todd: Mercy has also called for a parliamentary inquiry into working conditions of moderators, like Richard.
Mercy Mutemi: In this digital world we’re in, someone’s going to have to do the dirty work of content moderation, but there has to be safeguards. There has to be dignity in how you do the work. So how come no safeguards had been put in place to protect the workers? Everyone was lied to. They were told you’re just coming to do an administrative job, at best a customer care job. That’s not a customer care job. That’s a job that ruins your life. And there was a duty to disclose but then also to protect them while they were in the process of doing the job.
As human beings, as Africans, as young Africans, there is so much more that they deserve that they never got out of it. Like to have 20 year olds’ lives ruined because they did work for Meta and then to read Meta’s financial statements and see they’re making more in profit than most countries’ GDP, you start to realize that who is important here is not who Meta is in global politics, but who your client is.
Bridget Todd: What do you wish that more people knew about this hidden labor that powers AI systems?
Mercy Mutemi: That there is no magic to it. When you see a tech product being so effective and being so popular and being celebrated for all the magical things it could do, there’s no magic to it. It’s people. And the more dazzling the product is, the more labor that went into it, and the more likely there was harm done to the labor force that went into doing the work.
Let me give you an example. Like we keep hearing about Meta’s plan for the Metaverse and the next frontier and the virtual reality world. And behind it is a workforce, again, in Kenya, doing all the data labeling and annotation. And I kid you not, some of their work involves just training the algorithm what a penis is, right?
And day in day out, there’s someone who is just served with different shapes, sizes, colors of penises to just mark them. Every single day! And that’s how the algorithm will be able to sieve out. Like if you, if you go to the, you know, the platform and you post a picture of a penis, then that’s how the algorithm will know how to take it down, because somebody has trained it. So that’s just the simplest example I can think of to tell you that it’s not magical. It just doesn’t wake up and learn these things. So imagine what’s happened to that person after three years of doing this work constantly for the algorithm to get that good at picking up even impressions, not just actual pictures.
Bridget Todd: Mercy says she’s been criticized for calling for justice from foreign tech companies. Some fear that it’ll hurt Kenya’s international reputation in tech.
Mercy Mutemi: There’s been quite a bit of talk about, “This is driving away development.” But then I am very quick to remind people that before we were known for the emphasis on the Silicon Valley, we were a country that was known for advocating for human rights, and we have a very good constitution. We are not admiring and, you know, so love struck with Big Tech that we’re willing to turn blind eye to what harms they could be doing in the country. No, both things are not mutually exclusive. You can come and invest in our country and you can also respect our laws and respect our people. Those things can happen at the same time.
Bridget Todd: Do you think it’s possible that there could be a tech ecosystem in Kenya that truly puts people ahead of profit and what would it look like?
Mercy Mutemi: It would look like people first. And for that, regulation comes first. It’s going to be a constant that tech jobs are labor intensive, they’re going to need people and they’re going to employ large numbers of people in very questionable circumstances. So you need regulation first for what tech labor ought to look like. So I think that’s a certain point and there’s nothing else that can really be done at this point, and let me tell you why: companies are always going to care about their profit. They’re always going to care about making more money at less cost, so we can’t rely on self regulation to get them to care about the people before the profit. It has to be the regulators’ role to do that.
Bridget Todd: It makes sense that most of us wouldn’t know about data workers when so much AI is literally designed to make us think that machines are “intelligent”. And yes, I’m putting scare quotes around “intelligent.”
This plays into a corporate narrative of software with superhuman abilities. A narrative that’s perpetuated by tech companies and startups. It can be exasperating for data workers to be pushed to the side by the very companies that profit from their work.
Krista Pawloski: I am Krista Pawloski. I am a worker on Amazon Mechanical Turk, which we normally reference as MTurk. MTurk is a platform where small jobs can be posted by a person, a university, or a company. We call those people requesters. The work is typically surveys or AI training or data labeling, but it can be any task that can be done on a computer.
I originally started on MTurk I believe it was back in 2008. I was on maternity leave and it was just a way to make a little bit of extra money.
Bridget Todd: Fast forward a couple of years. Krista lost her job because she had to spend more time caring for her son. “Turking” became her full time income. Today, Krista is an organizer for a worker-led group called Turkopticon. The group petitions Amazon to make improvements to MTurk. It also has a review website where “Turkers” can rate and rank requesters on how fair they are.
Krista Pawloski: More and more I became dependent on my MTurk income. I realized that Amazon could close my account for any reason, and I would just be out my source of income and being an advocate for the workers suddenly became very, very personal to me. I needed to help make this a better and safer environment for all the workers out there.
Bridget Todd: It’s nerve-wracking that “requesters” have the one-sided power to give workers negative reviews or reject their work for no reason. Turkopticon wants Amazon to limit how much worker ratings can drop if a big batch of work is suddenly rejected.
Krista Pawloski: Turkopticon sent out a petition to get workers to sign, saying that they also wanted to stop the harm of mass rejections. After we had amassed a few thousand signatures we brought that to Amazon. When we had our last meeting with Amazon, we brought a Turker with us who had experienced a mass rejection, and experienced it to a point that her approval rating dropped to the point where she could no longer work on the platform. And we just wanted to have a real live person in front of them in the meeting to show them that these are real humans that you are affecting by ignoring this issue. These are real people that are no longer able to work, and you took away their source of income. And Amazon even admitted that the rejections were unfair, but still would not, and actually claimed that they could not, take the rejections off the workers’ records.
Bridget Todd: It’s possible for tech companies to work hand-in-hand with workers to improve platforms. But in data work, even the people requesting the labor rarely think of the humans who do the jobs.
Safiya Husain: Data is an extremely, extremely valuable asset. If I’m able to sell a technology for millions of dollars, I shouldn’t be able to get away with paying people literally pennies for the work that they’re doing.
Bridget Todd: Safiya Husain is the co-founder of an organization named Karya. It’s a data company that trains people in India to capture and label data, including voice data in several Indian languages.
[Indian languages speech data]
Bridget Todd: Safiya says Karya is a business that puts people first. This starts with describing gig work honestly to their more than 30,000 workers. They also offer training and special opportunities to people in rural areas who especially need the income.
Safiya Husain: What we do do at Karya is we guarantee a minimum wage. So we work in India and our guaranteed minimum is 400 rupees an hour, which comes out to around $5. That’s around 20 times the local minimum wage. And even across the global industry standards, it is, as far as I know, one of the highest wage rates to be paid for data workers. And even with those rates, we are, as a company, still able to cover all of our costs. We are not in any crunch and, you know, feeling the burn of paying these wages. And in fact, our active, I think, mission every year and every month is to figure out how we can bring our costs down even more so that we can give people even more money.
The market has failed to be able to provide data laborers with ethical working conditions. Even with our high wages and stuff, we are able to have around a 30 to 40% profit, right? That is something that even a for-profit company can do and still make money and still have happy investors. Right? So it really begs the question of why and what we’re asking people to do is kind of think of rebalancing the scales. And just figuring out how we can give more back so that it’s not such an unequal distribution of ownership and labor.
Bridget Todd: Safiya says fairness means revenue sharing too.
Safiya Husain: These are extremely, extremely valuable assets. So for me, it just only makes sense that a significant portion of this should specifically go to workers, right? I think what we tend to do when we think of that value chain is focus a lot on the people who are, like, cleaning the data sets or generating the models, packaging the models, selling it. And we’re forgetting the most fundamental bit, because if we didn’t have people willing to do this work, we wouldn’t have these technologies.
How do we create stronger, basically, legally binding ways of ensuring that no matter what, certain percentages of data, sales and resales go back to workers? One data set can be resold up to 10 times. This is especially true for the speech data space, right? Now, think about that. The first time the company is paying the worker, fine, they have their losses, whatever. The next nine times, all of that profit is going back to the data company, and most people who produce the data don’t even know that that’s happening. They don’t even know that their data got sold multiple times over and over. They don’t know how much it got sold for, right?
So, it’s really, I think, important to find ways of kind of solidifying that structure and really making sure that there are ways to trace how people contribute to data sets, and make sure that they’re given adequate compensation for that.
Safiya Husain: We’ve been able to sell or resell around 4,000 hours of our data which has allowed us to give, like, recurring income to a couple hundred of our workers.
Bridget Todd: That’s incredible. So just to make sure I have this right – if a data set is sold and then resold, are workers then paid again? Like if I buy a data set from Karya of annotated Urdu or Bengali speech, are the workers paid for their contributions again?
Safiya Husain: Yes. When we do resell these data sets, we have an active list of everyone who contributed. And we already have their bank information, and we just kind of say, “Hey, by the way, your data set got resold, here’s a payment for that.” Most workers are shocked when it happens because they’re like, “Oh, I didn’t know that this was a thing,” or didn’t really like, I guess, internalize that it could happen. And that’s something that I hope people won’t be as shocked at anymore, and hopefully many more thousands of hours will be able to be resold this way.
Bridget Todd: You may not be able to see AI’s invisible workers, but today you’ve heard their voices. For AI to be genuinely ethical we need to consider more than who is harmed after AI is deployed – we need to remember the people building it too.
They’re our virtual firefighters, peacekeepers, and crafters of information systems. So let’s trust them when they say tech has got to change.
I’m Bridget Todd. Thanks for listening to IRL: Online Life is Real Life, an original podcast from Mozilla, the non-profit behind Firefox. For more about our guests, check out our show notes, or visit IRL podcast dot org. This season we’re talking about “People over Profit” in AI.
Mozilla. Reclaim the internet.