Can Computer Drivers Act Like Humans in an Emergency?
“Driving is a profoundly social experience,” says Phil Koopman. How will your AI-driven robotaxis interact with the physical world during power outages, floods, snowstorms, earthquakes?
(Image: iStock)
“Humans are terrible drivers.”
This was the headline for a notorious full-page ad campaign launched in 2023 by the now defunct robotaxi company Cruise.
Although Cruise ended operations by the end of 2024, the ethos of the ad — blaming humans for traffic accidents — lives on among autonomous vehicle companies and their advocates.
Their utopia is a world where computer drivers have made erratic human drivers obsolete. Roads ruled by robotaxis, safer than human-driven cars, will save countless lives.
Autonomous vehicle (AV) companies, however, are not vouching for, nor even claiming, a robotaxi’s ability to think fast, behave flexibly and deal with unusual situations more appropriately than a human driver.
Typical “unusual situations” are power outages, floods, snowstorms, tornados, earthquakes—events that are, although inevitable, beyond the “experience” of autonomous vehicles.
As long as only a few dozen robotaxis are wheeling around downtown San Francisco, this level of ignorance might not matter much. But a widespread power outage hit the downtown in December, it proved to be a clinic in the “unexpected,” with a profound impact on Waymo’s scaled operations. Hundreds of Waymo robotaxis stalled, snarling traffic citywide.
Experts call such extreme scenarios “edge cases.” They challenge the operating limits of a system, software, or design. Despite Waymo’s claim of 100 million miles of fully autonomous, real-world driving (as of last July), a citywide power outage wasn’t an event for which its cars were trained.
Simulations
When we discuss simulations in development stages, we often talk about digital twins.
Sameer Kher, senior director of R&D Engineering, Systems and Digital Twins at Synopsys, explained, “When we talk about digital twins, we really mean virtual representations or models of things that exist in the physical world. There is a connection between that physical model, or physical piece of equipment, and the virtual model, which behaves like a twin.”
Kher noted, “What’s important is to have a virtual model based on principles of physics that you can explain and that are deterministic.”
The digital twin market segment in which Kher’s group is engaged seeks to help customers develop manufacturing or stamping equipment, electric drives, machines and motors. “We help them build digital twins of their equipment so that they can get deeper insights.”
Kher acknowledged that his group is “not at the point where we are dealing with autonomous systems yet.”
Similarly, Nvidia pitches its library of tools and models to validate AV performance in simulation before commercial deployments. But its primary focus is on meeting “the need for vast amounts of training data that mirrors the real-world diversity they will face on the road,” the company explained.
To design autonomous systems that can operate in an adverse environment, companies need to go beyond what today’s digital twins can offer.
Mock emergency drills
Last month when I interviewed Phil Koopman, safety expert and emeritus professor at Carnegie Mellon University, about Waymo’s robotaxi fiasco in San Francisco, he noted that Dec. 20th blackout was a warning shot for when the next major earthquake strikes the city.
Koopman proposed “a full-dress rehearsal in San Francisco.”
At first blush, the idea of a mock drill struck me as an analog throwback in this era of AI and digital twins.
But a little further explanation reveals its value.
Koopman’s own experience includes mock drills conducted when he was a kid living across the street from a major airport.
He said, “They usually round up a bunch of kids from local high schools or junior high schools. We would all smear ourselves with ashes and red paint. The lucky kid got the intestines coming out of his guts … And we’re told, ‘Okay, we’re not going to actually break your leg, but your leg is broken if they touch it in this place, say it hurts, right?’ And we strew ourselves around the runway, and the local ambulance would show up, the fire department would show up, and we get rides to the emergency room …”
It turns out that the Federal Aviation Administration (FAA) requires airports to certify safety with a full-scale airport emergency plan exercise once every three years.
Koopman noted, “That’s the only way you can know if you’ve missed any assumption.”
What if, for example, the airport has wrong phone numbers for emergency rooms and can’t summon ambulances? Koopman suggested. What if the list is missing a hospital? What if some ambulances aren’t working? What if the emergency rooms don’t have a surge plan in place, to cope with twenty casualties showing up at the same time?
“There’s one way to find out,” said Koopman. “Do a trial run and see what breaks. You’d rather find that out when it’s a bunch of high school students, compared to a real plane crash.”
Similarly, when I talked to Rob Rhoads, a retired fireman and former associate director at the Department of Homeland Security, I understood the importance of mock drills for assessing a disaster’s impact on all the stakeholders involved — including fire departments, law enforcement, civilians, hospitals and technology suppliers.
The drill is a chance to assess how systems, software, protocols and procedures hold up. More importantly, it enables a review of assumptions about the people’s reactions of people in real physical world disasters.
A firefighter with over 32 years of experience, Rhoads talked about the emergency drills in which he participated. Just before his retirement from DHS, he helped plan a big drill at Tysons Corner Mall with an active shooter scenario.
Rhoads said, “We had lots of injured people throughout the mall. The active shooter was still alive, and he was in this store. The EMS folks had to get the people out to a safe place while law enforcement was accompanying the fire and EMS folks to get them back to a safe place. And then, law enforcement was protecting that safe place,” Rhoads described.
Obviously, this drill involved a lot of people. It turned out well, eliciting lots of positive comments, said Rhoads. But he noted a number of vital precautions.
“You have to make it realistic.” Second, “You have to make it so that the resources that you are bringing to a drill are able to accomplish the tasks.” Finally, Rhoads said, “Is the drill to just have a show-and-tell and say, wow, that was cool? Are you trying to teach skills? Are you trying to identify weaknesses, perhaps things that can be improved? What is the goal?”
Large-scale disruption events
Drills are effective when an unusual adverse event “really stresses the social fabric,” noted Koopman. “We don’t want any of these disasters to happen, but they’re going to happen.”
So, how does that translate into robotaxis in the big city?
Koopman said, “If you’re going to deploy 1000 robotaxis, or [if] … some significant fraction of capacity for trucking is going to be robo-trucks, you must start paying attention to large-scale disruption events.”
Mock drills underscore the lesson that “you must be able to respond to the real world tactically, but also with a wide operational area.”
If a robotaxi knows how to stop at a dead traffic light in a power outage, Koopman said, “That’s table stakes.”
Next question: What happens if all the traffic lights go out?
He noted, “If you say, well, ‘98 percent of our vehicles can get it right, well, times 1000 cars isn’t so great. That’s a lot of cars that are going to have trouble. If so, what’s your plan when there’s an earthquake?”
The catch is that when people talk about dependability and safety on the road, they recognize the randomness and variability of human reactions. But with computer drivers, “You lose the independence of every driver reacting differently. Computer drivers all react the same way.”
The result is “a common cause failure” when the world does something you’re not ready for. Koopman said, “We will see the same failure in all of your deployed vehicles.”
Emotional/physical impact
Asked about the benefits of mock drills not possible in exercises such as video games, Rhoads said, “There is the emotional aspect. How do you control those emotions when you see the blood, gore and violence?”
He noted, “This is very different from sitting in a chair and using a joystick.”
There is also the real-world physical experience that other simulation methodologies such as tabletop board games or video games can’t offer.
Koopman posed the example of putting out a fire, a challenge complicated by smoke and heat blowing in your face. You are breathing through a mask that makes it hard to see.
These are analog problems that are hard to simulate digitally. And they circle back to the balance between artificial intelligence and the human mind.
Social interaction
Technologists enamored with machine learning innovations that gave birth to autonomous vehicles often forget to consider how well—or badly—computer drivers can interact with human drivers, cyclists and other vulnerable road users.
Koopman said, “Driving is a profoundly social experience.” This leaves a tall order that has yet to be answered. “You need computers,” he concluded, “that know how to act like people.”




The need for live drill exercises exists even without the new complication of robotaxis.
Thanks. Reading "Embodied AI" now. The discussion of the SF traffic light outage is a great addition. 98% with 1000 cars is "only" 20 (partially?) blocked intersections during an emergency. At certain times of day gridlock in SF is normal, and my sense is that even 20 would be bad. But stress testing has to focus on just such time-of-day amplification of what might otherwise be a mild nuisance.
I've encountered snowstorms twice that led to people abandoning vehicles on interstates. Getting them to safety, and then digging out the vehicles afterward both stressed systems. Even finding temporary lodging was problematic, both in sheer numbers and in the challenge of actually getting them elsewhere in a blizzard. Would the existence of robotaxis matter, if basically everyone gets stuck? How long would it take to pull every robotaxis out of service during rush hour? Maybe the real issue is how to get them operating and out of the way as they get dug out. Plus making sure that people don't get locked in, even when the software "knows" they're on a limited-access road where in normal times no one should be allowed to exit a vehicle even when traffic is at a (in most cases temporary) stop.
I assume Phil and you are compiling a list of such scenarios, or know people who are. Please return to the issue!