Has Waymo Gone End-to-End AI?
Is E2E AI so revolutionary that even Waymo is shifting strategy? In theory, any robotaxi company struggling with modular AI, E2E can "give them another bite at the apple," says Phil Koopman.
(Image: Waymo)
“End-to-End (E2E)” AI learning is the new black in automotive fashion. Tesla, reportedly the first to read the writing on the wall about limitations in scaling with “modular” AI, switched to so-called E2E AI several years ago.
Everyone’s big question now is whether E2E AI learning is so revolutionary that even Waymo is shifting strategy.
Why does it matter?
Because E2E AI is still new, it offers “another bite at the apple” for companies developing self-driving cars, observed Phil Koopman, professor emeritus at Carnegie Mellon Univ.
In theory, E2E enables autonomous systems to learn directly from data instead of objects that have been painstakingly labeled. That means, “you can reduce staff size and increase velocity,” Koopman noted, because E2E lets companies to eliminate people hired to label objects, while you also need fewer programmers because you are not writing all the glue codes between modules.
On stage at The Autonomous event in Vienna last week, several German automotive executives described E2E AI as a “big topic” that will alter vehicle architecture and pave the way to L4 and L5 autonomy. These predictions, predictably, came without actual plans.
Off stage, said Koopman, “I’ve heard speculation that Waymo might be going end to end. But I have no idea” if it is the case.
Koopman suspects significant pressure, both technically and organizationally, on Waymo to switch to E2E. He said, “The question is more likely to be when, rather than if.”
Indeed, Waymo researchers have been working on E2E. Last year, Waymo published a technical paper which introduced “EMMA,” an End-to-End Multimodal Model for Autonomous.
But when asked earlier this week if the company has begun deploying EMMA, Waymo hedged. A Waymo spokesperson explained that our “extensive experience and research have shown that to guarantee safety and performance at scale, pure E2E models aren’t enough.”
Further, rather than choosing one AI learning approach, Waymo cited the company’s “holistic approach, by leveraging the efficiency of end-to-end learning,” combined with “Waymo’s rich semantic understanding and robust evaluation.”
Beyond the obvious PR-speak, the question of how Waymo can weld together two very different AI learning principles remains a mystery.
E2E AI vs. Modular AI
First, let’s define the differences between E2E AI learning and a Modular AI approach.
“End to End” in self-driving brings in all the pixels (raw sensor inputs such as camera images) and throwing them into one big machine-learning chunk. “Then, steering (driving actions or decisions) comes out at the end,” Koopman explained.
In contrast, the modular machine learning approach breaks down the driving task into well-defined stages, does the pieces of machine learning in modules, and proceeds with steps in a sequential manner. Presumably, this is what Waymo’s been doing. So did Cruise.
Why go end to end?
Missy Cummings, an AI/robotics professor at George Mason University, also at the Vienna conference, acknowledged E2E as “the latest craze.”
Asked about its advantages, she noted, “In the old days, we would take pictures, we would label the pictures, and then the cars would have to reason on a series of static images to determine what’s the next right best action. [But] if you do all the end-to-end learning in videos in theory, you’re streamlining the process and you’re speeding up the process. In theory.”
Simplified engineering
Speed is an obvious advantage. E2E can also simplify the engineering process.
Mauricio Muñoz, senior research engineer at AI Sweden, explained: “You can simplify the engineering process itself, because you ‘offload the difficulty of coordinating subsystems, by turning it into a data problem.”
For example, “you trust that the learning process can find the optimal way to solve the task, given enough data.” Mauricio added a couple of caveats here. “Note, this might not be the way a human may design it.”
Also, the operative word here is “given enough data.” E2E needs a whole lot of data to pull that off.
Nevertheless, for many automakers, E2E poses “the exciting promise of ML in general,” Muñoz said, “because we do not have to do the programming manually.”
What about interpretability?
All design decisions are tradeoffs. E2E design is no exception, reminded Muñoz “You do have to pay a price for E2E, in terms of interpretability, and verification of the process itself”
In automotive, this is a problem that affects safety. “Finding where something went wrong … becomes just short of impossible with E2E. Knowing which parts of the total data distribution are making the model malfunction becomes a difficult question to answer,” Muñoz explained.
He opined, “E2E systems are simply prone to more malfunctions, because they invite the model to learn from more high-dimensional ‘shortcuts’ than you would otherwise have in a modular system, where each component is responsible for a smaller function’ that is easier to inspect. “
The loss of richness
Cummings, who researches computer vision, worries about the loss of richness in visual image in a world presented to computers in zeros and ones. “The loss of richness is the same as the increase in uncertainty,” she stressed.
“We have a lot of uncertainty in the world. That uncertainty is growing particularly in events where we’re doing a lot of compression,” she said.
“E2E learning is great as long as everything is normal or even just outside one standard deviation. But as soon as things start to not match the patterns the car has seen in the past,” vehicles based on E2E could fall apart, she explained.
Degradation of driving skills
Cummings cited a “rash of video” now available on the Internet showing Waymo cars turning into oncoming traffic.
“That has never happened before.” Hypothesizing that Waymo might be already deploying E2E-based robotaxis in small volume, Cummings said, “I feel as though something is going on with the E2E learning causing it to do that.”
E2E: ‘Seductive and dangerous’
Koopman sees many reasons why OEMs and Tier Ones find E2E machine learning “seductive,” although it is “dangerous for safety.”
E2E eases the shakedown process for robotaxi operations. “We saw that with Tesla, when they switched to end-to-end with FSD, people said, ‘Oh, works a lot better.’”
In Koopman’s view, “If you have ample training data, you don’t have programmers who can make mistakes, and there will be no such things as labeling errors.” No labels mean that the cost of human labor needed to label objects goes away. It’s a huge win for corporations.
In sum, “E2E requires a lot less work to get the first 90% working [in robotaxi operations]. For that, E2E is extremely attractive and useful.”
Can’t be trusted
Peter Schaefer, executive vice president and chief sales officer of automotive at Infineon Technologies, said at The Autonomous, “I’m convinced that end-to-end AI will bring us to Level 4 and 5. And it will help us to have faster learning cycles.”
But Shaefer cautioned: “We also need to face the fact that we cannot trust the outcome of the E2E AI blindly.”
Shaefer isn’t alone. Many automotive engineers are deeply concerned about E2E AI learning’s interpretability deficit.
Companies are working on different E2E flavors, for which they are using different terminology. Some call it “monolithic” E2E — training the end-to-end AI in one go without breaking the process into chunks. Others advocate “modular E2E” — in which the end-to-end system breaks into smaller neural networks, but they are still trained “end to end.”
Or, you can build a transformer-based system with multiple attention heads. You can assign object perception to one head and planning to another. This is one flavor of modular end-to-end, said Koopman, all part of the same giant transformer system.
Bosch Modular E2E system
Whether you are doing a “pure” monolithic or modular E2E, carmakers must still struggle for interpretability, to validate that E2E is doing the right thing inside the black box.
Mathias Pillin, CTO of Bosch Mobility, presented his version of E2E AI learning ideas at The Autonomous.
Noting that E2E training is a big discussion in both industry and academia, Pillin acknowledged assertions that E2E must be really monolithic to provide good results.
Bosch disagrees.
Pillin said, “I am deeply convinced that for automotive, we need to have a structured architecture in the middle.”
Showing a conceptual slide, Pillin said, “You might be wondering why we still have those three blocks on the right side (shown below), listing ‘perception, fusion and planning.’”
This provides options, he said. “You can train them in loops. You can train them end to end,” said Pillin. “The good thing is that the interfaces between the modules are being adapted along the way.” According to Pillin, this approach is already hitting the streets. “So, we are gathering experience.”
How Bosch’s “structured” modular E2E system can provide interpretability remains unclear. During a Q&A session, an audience member asked how interpretable is the interface in Bosch’s modular end-to-end system. Pillin either didn’t have a good answer or didn’t seem to have understood the question.
This is an important issue, noted Koopman, because machine learning is uninterpretable. “You have no idea what you’re looking at.” This dilemma begs for a middle layer with some kind of a mechanism or data structure that allows humans to see —or “wiretap”—what’s going on in the process in the middle, he added. How to work with the middle in the E2E AI learning system will be a critical issue if you are looking for transparency. But no corporations have outlined details of their plans, yet.
Back to Waymo
Waymo often touts its “world-class technological success built on lessons learned from +100M miles driven.” But by switching from modular AI learning to E2E, Waymo forsakes a lot of its talking points.
Waymo’s robotaxi built on E2E AI learning is “a completely different vehicle,” said Koopman. Although this isn’t a reason for Waymo to scrap E2E, he added, “I’m just observing that anyone who claims that safety data carries over is making a really suspect claim.”
None of the experts is disparaging E2E AI learning, though.
Koopman noted that the new AI model looks better, “because we don’t know what the problems are until we have experience with it.”
As Cummings said, “E2E learning makes for great demos.” But, one will still have to see how the E2E-based vehicles work out.
She added, “I’m a big fan of the pilot demos. I think that this technology, any AI, needs to be matured through its opérations … careful operations.”
In Cummings’ opinion, companies are not testing enough and they’re cutting corners on testing. She made clear, “If you want to say that my car can do x, y and z, you should be able to point to test results that show that at least under reasonable circumstances you’ve got some assurance that the car will perform correctly.”
E2E is no exception. Companies’ claims need to be matched by solid results.
Related stories:
What does end-to-end entail?
On October 9th, Phil Koopman will give a talk on End-to-end AI learning vs. Modular AI in our weekly “AI from Toy to Tools” video podast series. It’s the episode you don’t want to miss if you have been wondering what E2E learning entails.
Each episode will drop on our YouTube channel, Junko’s Talk to Us: https://www.youtube.com/channel/UCVWz3xsmgaouVtNfu3SL9Qg
Below is the teaser.
AI in Automotive: What’s domain experts’ responsibility?
Mauricio Muñoz, senior research engineer at AI Sweden, has shared his views on domain-specific AI in the automotive field. Below, you can watch the full episode of Mauricio’s talk, just dropped last week:
Don’t forget to subsribe to our YouTube channel so that you don’t have to miss our episodes of “AI from Toy to Tools” video podcast series.





The promoters of AV seem to have adopted a strategy of taking two steps forward and then correcting errors by taking a step back. However, the reality seems to be more of a one-step-forward, one-step-back approach. The result resembles a car stalled in the middle of an intersection.