Crash Test Dummies

Season 7: Episode 3

Why does it so often feel like we’re part of a mass AI experiment? What is the responsible way to test new technologies? Bridget Todd explores what it means to live with unproven AI systems that impact millions of people as they roll out across public life.

Published: November 7, 2023

Show Notes

In this episode: a visit to San Francisco, a major hub for automated vehicle testing; an exposé of a flawed welfare fraud prediction algorithm in a Dutch city; a look at how companies comply with regulations in practice; and how to inspire alternative values for tomorrow’s AI.

Julia Friedlander is senior manager for automated driving policy at San Francisco Municipal Transportation Agency who wants to see AVs regulated based on safety performance data.

Justin-Casimir Braun is data journalist at Lighthouse Reports who is investigating suspect algorithms for predicting welfare fraud across Europe.

Navrina Singh is founder and CEO of Credo AI, a platform that guides enterprises on how to ‘govern’ their AI responsibly in practice.

Suresh Venkatasubramanian is director of the Center for Technological Responsibility, Reimagination, and Redesign at Brown University and brings joy to computer science.

IRL is an original podcast from Mozilla, the non-profit behind Firefox. In Season 7, host Bridget Todd shares stories about prioritizing people over profit in the context of AI.

Transcript

Bridget Todd: Do you ever get the feeling that we’re all just crash test dummies in one big experiment with artificial intelligence? Well, fasten your seatbelt, because a lot of tech developers still want to ‘move fast and break things.’

Suresh Venkatasubramanian: So the idea that it’s okay to move fast and break things because we’re in this experimental phase, where we’re just trying out stuff, and you just want to learn new things, is no longer true when the things you’re deploying are almost immediately affecting millions and millions of people.

Bridget Todd: That’s Suresh Venkatasubramanian. He’s the director of the Center for Tech Responsibility at Brown University. We’ll hear more from him in a bit. I’m Bridget Todd and this is IRL, an original podcast from Mozilla, the nonprofit behind Firefox. In this episode, we ask how AI can be safer, in real life.

Bridget Todd: We’re on the streets of San Francisco. This is a city where you can hail a taxi without a human in the driver’s seat. There are hundreds of automated vehicles, or AVs for short, with steering wheels that turn by themselves. I’ve heard AVs can be safer than human drivers, but I’ve also heard that it’s hard to get AVs to recognize pedestrians equally well in every situation. It’s a lot to literally trust a tech company with your life. Do they cut corners to get their products to market faster? In San Francisco, they’ve been grappling with this question.

Julia Friedlander: So it’s really an everyday experience for people who live or work in San Francisco to see these kinds of amazing pieces of technology on our roads.

Bridget Todd: That’s Julia Friedlander. She’s the senior manager of automated driving policy at the San Francisco Municipal Transportation Agency.

Julia Friedlander: I think that probably the thing that most people who are working in the industry would immediately identify as the top priority is to improve street safety. And that is certainly a goal for the city. We have, really, a nationwide crisis of injury and death on our streets.

Bridget Todd: In many ways, AVs really are safe. Julia likes how they stick to speed limits, unlike many humans I know. Cameras and sensors point in all directions to guide them in traffic. But they aren’t prepared for how to react in every situation. That can mean collisions. Injuries. Gridlocked streets. And blocked fire trucks.

Julia Friedlander: I think a lot of regulators expected that the vehicles would simply be completely compliant with the rules of the road. That’s not what we’ve seen.

Bridget Todd: At least 40 companies have licenses issued by the state of California to test AVs in San Francisco, but mostly with humans behind the wheel. After a lengthy public hearing in August 2023, the two companies with the most cars – Alphabet’s Waymo and GM’s Cruise – got the green light to operate as many taxis as they wanted — with no drivers at all. Like a lot of local residents, Julia was critical of how quickly these companies would expand their fleets without more evidence of how they perform.

Julia Friedlander: We have been honestly surprised to see the driverless AVs have impacts on our fire department operations. And that is not something that we really expected. San Francisco firefighters have submitted written reports on more than 75 events in 2023 when AVs interfered with their operation in some way. That’s like one every three days, and you know, those are just the written reports. And these events create hazards for our first responders and they create hazards for the people that those first responders are serving.

Bridget Todd: So, there were problems. Julia’s agency called for a rehearing of the Cruise and Waymo decision. But before that happened, Cruise had its permission to go driverless suspended, after just 3 months. They were accused of withholding footage of an incident where an injured pedestrian was pinned and dragged under the wheel of one of its AVs. For now, Waymo can keep going. So we talked to Julia before all this happened. But it brings into focus exactly what she and other local officials were saying: that decisions about AVs should be based on more detailed data about the good and the bad. This is information they don’t currently have to share. Companies have to disclose information about mileage traveled and collisions, but not on near misses or traffic disruptions. So Julia’s agency pieces together what information they can from other sources, and shares it with AV companies and regulators to work on solutions. But they don’t have any power to regulate AVs themselves. That happens at the state level.

Julia Friedlander: We would like to see regulators regulate based on performance. We’d like to see some minimum standards for performance, and we’d like to see incremental improvals of expansions based on demonstrated performance. That is the through line for all of our advocacy in AV regulation.

Bridget Todd: San Francisco is just one of several cities where AVs have arrived, where streets have literally become testing labs for tech companies. They say that their high-tech solutions are the best answer to complex social problems. But even if AVs work how they’re supposed to, are they the best transport strategy? What about bicycles and public transportation? Streets designed for pedestrians instead of cars? After all, plenty of cities reduce traffic deaths and injuries, without AVs.

Julia Friedlander: This is technology that is using our public, our most important public space, and we think it is important for the industry to be given that opportunity to grow as soon as it is demonstrated that it is capable of operating in a way that doesn’t create new problems, that is safe in terms of avoiding crashes, and also is not creating new and other safety problems.

Bridget Todd: This is just one of the countless examples of how corporate AI is expanding into the public realm in countries everywhere, without nearly enough transparency and oversight.

Justin-Casimir Braun: So there was this Eureka moment where I found all this data contained in this document that was just meant to show us some pictures.

Bridget Todd: Justin Braun is a data reporter for Lighthouse Reports, a nonprofit investigative newsroom in Europe. They saw that journalists in the US had uncovered racial bias in algorithms for predictive policing and prison sentencing across the country, and they wondered whether similar things were happening in Europe.

Justin-Casimir Braun: And so we started out with a bunch of different fields that we were looking at initially: predictive policing, sentencing, algorithms, but fairly quickly zeroed in on welfare. Welfare, I think, from an American perspective in Europe seems really generous, but I think, in fact, there has been a very harsh turn in the last two decades, and it’s become a lot more punitive. A lot of it is about sanctioning people that are trying to apply for welfare. And then the other thing is that this is an area where the deployment targets really, really vulnerable people who have very few resources to fight any potential rights infringements. And so we thought that this reporting and kind of learning from our US colleagues could really do a lot of good in this space.

Bridget Todd: Justin and his colleagues set out to investigate how an AI system predicts who is likely to commit welfare fraud, and whether these systems were biased. They requested data from several countries through freedom of information laws.

Justin-Casimir Braun: We wanted to achieve what we call the Holy Trinity of algorithmic accountability reporting. And that means we wanted full access to the training data, the source code, and the trained model file. Rotterdam was the first place where we got it.

Bridget Todd: Officials in this Dutch city only sent Lighthouse part of the data they asked for. Or so they thought. But deep in the source code, Justin discovered a secret. It was records of 12,707 real people that they had used as training data for the system.

Justin-Casimir Braun: Very few people have gotten access to this level of data before. And, when I saw this, I was, I was just thrilled. Like, I spent 10 minutes, I think, dancing in the kitchen and calling everybody else on the team. It was a good night.

Bridget Todd: So you knew it was gonna be big when you got this information.

Justin-Casimir Braun: Yeah, absolutely. Yeah.

Bridget Todd: Thanks to the secret data, Justin could see how the system calculated who from tens of thousands of people should be investigated for fraud. And he could see that people would be flagged in ways that were highly discriminatory. The predictions were based on scores that its developers assigned to 315 attributes, or “variables” as they’re called, that caseworkers used to describe people. There were variables on age, gender, marital status, and language, as well as several describing interactions with the welfare agency. Like, how many meetings did they attend? But some variables seemed odd for predicting fraud, like how a person dressed, or even what their hobbies were.

Justin-Casimir Braun: They had used a bunch of variables that seemed to be essentially subjective caseworker assessments of the fundamental character of the welfare recipients. So that included stuff like, do they have ethics and integrity, right? Like some of them were really sexist. There was a variable in there that judged physical appearance, and we got access to some of the code books, kind of the instructions for caseworkers on how to fill in these variables, and it said that, you know, if women weren’t wearing makeup, they should, like, get a note on that. So it’s just kind of gross – to begin with, that this data exists, that the government thinks it should be collected, and then that it thinks it’s fine to use it to predict fraud. It’s just very odd and deeply troubling.

Bridget Todd: Scores were calculated by combinations of variables. So if you were a woman, you’d get one score, and if you had more than one kid, or didn’t show up to the meetings, or spoke a certain language, it could affect your score differently from person to person.

Bridget Todd: And after somebody is flagged by this AI, like what happens? What’s the impact?

Justin-Casimir Braun: So I think this is the big motivation for doing this in the first place, right? Investigators who follow up on these fraud investigations in the Netherlands have extremely expansive rights. There’s very little due process. They can search your house at times without announcement. They can call you to be interviewed at very short notice. They are at times just very rude and, you know, all of this is happening to people in extremely vulnerable situations. These people are getting benefits that are meant to guarantee that they don’t fall below the poverty line. As soon as a benefit is cut, you are below the poverty line. That means you don’t have enough money to eat. It means you don’t have enough money to pay rent, to clothe your children. And so just the specter of an investigation can be very taxing on people’s mental health, and we found that again and again in interviews that my colleagues did.

Bridget Todd: And again, it goes back to what you were saying about this system sort of being set up to be punitive. Like it doesn’t have to be a system that is, ‘On top of it, the person who comes to your home on short notice is gonna be rude and rifle through your things’, but it seems like that punitive nature is almost baked into the experience once you’re flagged.

Justin-Casimir Braun: I mean I think there’s a question whether we should have systems this punitive to begin with against populations this vulnerable. But I think there is – and this is a really important one – I think there’s a tendency in which AI can make these punitive systems appear more rational than they really are. It gives kind of added justification and it can kind of embed them deeper bureaucratically. And so I think there is ways in which the use of tools like this can help maintain and strengthen and kind of make appear more authoritative, these systems in ways that I find quite concerning.

Bridget Todd: The Lighthouse team uncovered biases, like how women with migration backgrounds were flagged at higher rates than others. But above all, they say, systems like these simply do not work.

Justin-Casimir Braun: I think it’s trained on data not fit for purpose. It’s using variables which have obvious discriminatory potential. It doesn’t work very well. It barely works at all. And it shows clear patterns of discrimination against people within a vulnerable group, who are even more vulnerable than the other people in this group.

Bridget Todd: Rotterdam hasn’t used their system since 2021, and recently decided not to replace it either.

Justin-Casimir Braun: I think what we found is that the system is inadequate. It’s inadequate because it discriminates. It’s inadequate because it is embedded in organizations that don’t know well how to deal with these systems. And it’s inadequate because it reinforces discriminatory patterns that have existed before, but it kind of strengthens those patterns and makes it harder to tackle them.

Bridget Todd: And yet, even after journalistic investigations like this are repeated over and over, these systems are still unleashed on people without adequate precautions. Algorithms aren’t to blame for the human bias of the welfare system, but they can really amplify it.

Bridget Todd: Behind the scenes, AI systems are becoming more deeply interwoven with government and corporate decision making. Private tech companies are playing a bigger role in interacting with people “as data subjects” on behalf of countless institutions. Let’s meet someone who helps companies figure out how to earn trust.

Navrina Singh: Is it possible to put people before profits? Let me actually reframe the question. The enterprises that put people before profits are the enterprises that are going to win, and they’re going to win because they build that trust.

Bridget Todd: That’s Navrina Singh in California. She’s the founder and CEO of Credo AI and a former Mozilla Foundation board member.

Navrina Singh: Credo AI helps operationalize policy and regulations, as well as making sure that there’s testing of both your datasets and AI applications, especially in context of use.

Bridget Todd: When Navrina says ‘operationalize,’ it’s about helping companies create mechanisms to monitor what they’re doing with AI continuously. Credo AI’s platform supports companies in building risk dashboards and impact assessment reports. They help companies pinpoint their values and set benchmarks for them. And if they need to be compliant with regulation, it guides them through the steps they need to take. Navrina says a lot of companies have never thought about how to measure whether their systems live up to their own stated values. She says that’s because AI presents more “socio-technical” challenges than tech companies have ever had to consider before.

Navrina Singh: It is not just about, you know, a programmatic way to solve for those. It is really about, how do you bring in all these diverse voices, during the design development, as well as during production, to make sure that the impacted groups have a pathway in providing that feedback to improve this technology.

Bridget Todd: Navrina refers to all this as “AI governance”.

Navrina Singh: So as you can imagine, absolutely it needs a massive mindset shift in terms of the role of governance. And especially the narrative that governance can stifle innovation.

And then just to give you an example: in the financial services sector we’ve been working with a fortune 50 AI first ethics-forward company. And this organization, what has been fascinating, is not only did they want to bring governance to manage oversight of thousands of AI systems that they have from risk scoring to anti money laundering to underwriting, but they really in that process also wanted to: one, sort of hold on to, and continue to bolster their mission of financial inclusion, where, as you can imagine fairness of these systems that they are building for the end consumer is really important. But secondly, also building trust with their customers, who are the banking organizations.

Bridget Todd: In three and a half years, Credo AI’s business has grown. And Navrina wants to see policies that can push companies to do more. She’s on the National AI Advisory Committee in the US, and is optimistic about how regulation in the US, EU, and elsewhere, can help bring more fairness and equity to AI.

Navrina Singh: We are hoping that the EU AI Act gets passed at the end of this year, you know yet to be seen. But I think that is going to put in place a nice structure for organizations to start thinking about their responsibility and accountability, especially from a risk perspective.

I would say that we are just scratching the surface of how we’re going to see regulations emerge, and the impact that they’re going to have.

Suresh Venkatasubramanian: When we talk about self-driving cars, when you talk about risk assessments to decide whether people go to jail or not, or use hiring algorithms…These systems are being deployed; the data on them is being collected, and that data is being used to inform the next generation of tools.

Bridget Todd: That’s Suresh Venkatasubramanian. He’s a computer scientist and the director of the Center for Tech Responsibility at Brown University in the US.

Suresh Venkatasubramanian: So by any definition of the word testing, we are being tested and we are being evaluated and we are being experimented upon. That doesn’t mean it’s a bad thing, but it’s helpful to keep that in mind – that this is not finished technology.

Bridget Todd: Suresh also recently finished a stint in the White House as Assistant Director for Science and Justice. He says more voices need to be heard in the design of AI and AI policy.

Suresh Venkatasubramanian: If you ask who should be in the room, it’s very simple. Anyone who’s being impacted by this. And, you know, that includes the people who are usually always in the room, the folks developing the systems, but it also includes those who are affected by it. Time and time again it’s been made clear that if we had people in the room who are affected by decision systems, people who are subject to surveillance, their experiences and insights would inform the design of a better system, or would make sure the systems are tested so that they do work for everyone and not just a few people. We’ve learnt this lesson over and over again, and we still make that mistake. So that’s one component to it.

Another dimension of what we need is to have people who have knowledge about different perspectives, right? When we deploy technology in a public space that affects people, it’s all about how the technology interacts with the people and the systems around it. So you need people there who study how humans interact. We need the sociologists. We need the anthropologists. We need the philosophers.

Bridget Todd: While he was at the White House, Suresh helped co-author the “Blueprint for an AI Bill of Rights.” It’s designed to inspire future regulation.

Suresh Venkatasubramanian: The Blueprint for an AI Bill of Rights articulates protections that people should have in an algorithmically-driven or managed society. And it’s actually a fairly simple set of things. You should make sure systems are safe and effective before you deploy them and incorporate impacted people into the design process before you build a system.

Systems should not be discriminatory. There are many, many systems, decision systems, that draw on historical data and make decisions about individuals based on historical data. In a country with a long history of racism and other forms of biases that historical data is biased as well, and that’s what the systems produce.

And so we want systems that are tested and evaluated. Our data should not be used and monetized, except in ways that are specific to the task at hand. If you are denied, say, a loan, I think it’s appropriate to say, “Why? What was the reason for it? Could it be that the system made a mistake, that it collected incorrect data?” These are the kinds of protections you would expect to have to make sure that you can get technology that does work, that works effectively, and isn’t just being deployed on the fly.

Bridget Todd: At Brown University, Suresh’s goal is to inspire students to rethink and reclaim how technology is developed with people in mind.

Suresh Venkatasubramanian: We can design with, you know, with joy, with fun. Computer science is fun. It’s just fundamentally – at least for me it is – it’s a fun discipline. And we can come up with cool new problems to solve, but that also actually benefit us, that actually pay attention to what we would like to have and what we need, and design systems that help us, rather than trying to push us into little boxes that are legible by machines.

Bridget Todd: A lot of people say they fear what the AI behind systems like ChatGPT could do. But AI is already everywhere. Big AI companies are making hasty decisions about our data and our safety today, while the guardrails are still just being developed by regulators. Suresh envisions artificial intelligence that we, people, not companies, are more in control of.

Suresh Venkatasubramanian: If we could build, you know, smaller machine learning models that only use personal data, but still could be effective at learning things about me that could help me, I would be fine with that. So sometimes it’s just a matter of rethinking who is the real customer. Who’s really benefiting? And making sure that we can also imagine solutions. Some of the earliest technologies that we have – things like email, things like the entire web, and all the different protocols on the web – they were just built as infrastructure. What is a new kind of infrastructure we can build that’s more personalized, more focused, more under our control? I think these are things that we should be asking.

Bridget Todd: So it’s not just a feeling. We may still be the crash test dummies for a lot of AI systems. But our seatbelts will be the rules and regulations that guide how they are developed. Just like actual seatbelts in cars, it will eventually seem unthinkable that we ever went without them. So, instead of ‘move fast and break things,’ let’s move slowly and repair things. Or, think twice about what AI systems we need in the first place.

I’m Bridget Todd. Thanks for listening to IRL: Online Life is Real Life, an original podcast from Mozilla, the non-profit behind Firefox. For more about our guests, check out our show notes, or visit IRL podcast dot org. Mozilla. Reclaim the internet.

Season One

Season Two

Season Three

Season Four

Season Five

Season Six

Season Seven

General

Press

Crash Test Dummies

Season 7: Episode 3

Show Notes

Transcript