DeepSeek Impact Puts Edge AI on Edge
China’s DeepSeek, which demonstrated AI can be done with fewer resources, unleashed shockwaves through the AI training community. But what about AI inference? Any rethinking at edge AI?
NXP Semiconductors this week announced a definitive agreement to acquire Kinara, a startup developing Neural Processing Units (NPUs) and software development tools. Headquartered in Calif. with a development team in India, Kinara is focused on generative AI.
While the training side of AI felt the biggest DeepSeek impact, a broad range of changes is inevitable — due to model proliferations, multi-modal applications, open source and lower cost — for AI inference at the edge.
Economies and open training models will foster the development of “better models that are potentially cheaper, even if they aren’t open source,” predicted Rutger Vrijen, senior vice president, Strategy, Secure Connected Edge at NXP. Further, he sees “more capability at lower compute powers to edge devices, if we can compress those models to the edge.”
In short, he said, “If anything, this would accelerate our need for having a portfolio that can use that capability at the edge.”
NXP had started evaluating Kinara long before DeepSeek’s news hit the street, but NXP isn’t taking credit for foresight. DeepSeek “has just revalidated our thinking” of the Kinara acquisition, said Ajith Mekkoth, senior vice president, Engineering, Secure Connected Edge at NXP.
The Chinese company has shown that “these [AI] models are going to evolve very fast, and the ability to have a programmable core becomes critical for us to adapt fast is the key to our success,” for AI inference at the edge, noted Mekkoth.
NXP executives cited two elements in DeepSeek’s impact on AI inference. One is the need for “programmability that can bring new models fast, enabling customers to take advantage of the quick evolutions that will happen in the market,” said Mekkoth. The second is “open-source models,” prompting edge AI processor companies, including NXP, to “explore what they can do with them in a system.”
Other edge AI chip companies haven’t yet spelled out their analysis of the DeepSeek’s effect. The only certainty is that a lot of people are carrying out lots of debates on this very topic inside their company.
Sense-Think-Act
The acquisition of Kinara enables NXP to add much more programmability and scalability to its product portfolio.
Known for its “sense, think, and act” tag line, NXP has created a plan to substantially grow its brain — or “think” function —for processing generative AI models at the edge.
Although traditionally strong in sensing and actuation, NXP, thusfar, has had to depend on other big-brain chip suppliers – Nvidia, Qualcomm and Mobileye. This was particularly true, for example, when NXP’s automotive customers needed to design highly automated vehicle architecture. Auto industry analysts have long wondered when NXP might develop its own brain chip.
Asked if the Kinara acquisition is a shift in NXP’s ‘brain chip’ strategy, Vrijen noted, “I don’t think this competes directly with the Qualcomm brains, because we are very much at the edge with these [AI] solutions.” However, given how AI is about to increase “the usefulness of edge systems” and is transforming “productivity, energy efficiency, security, reliability, operational efficiency,” he said NXP “cannot ignore the brain. That brain will be a core part of the system solutions.”
Vrijen acknowledged it has become strategically essential for NXP to “have control over how that brain will develop in the future.”
Edge AI landscape
NXP’s move also illustrates how the edge AI inference landscape has gotten more fragmented, diverse and complex.
Among edge AI processors, the key questions include how much AI processing will soon be afforded at the edge, how fast an increasing number of broader AI models will be developed and how they will be applied to AI inference systems at the edge.
Are we ready?
Today, at the high end, hardware optimized for edge inference and on prem range is percolating at Apple, (Neural Engine), Qualcomm (Snapdragon NPUs), Intel (Gaudi chips), and AMD (Ryzen AI processors).
In contrast, playing at the edge closest to the sensor, STMicroelectronics has recently unveiled what its MCU chief Remi El-Ouazzane calls “the industry’s first MCU combined with Neural Processing Unit (NPU).” ST’s new NPU-accelerated MCU includes up to 600x machine-learning performance, making it the most powerful STM32 today, the company claimed. It comes with 4.2MB of RAM, more than ever before embedded on an STM32. It is also ST’s first MCU integrated with an advanced image signal processor.
Companies such as Renesas and Silicon Labs are developing versions of AI accelerated microcontrollers. More specifically, Synaptics unveiled last month its collaboration with Google on edge AI with a mission to define “multimodal processing for context-aware computing” for IoT.
By acquiring Kinara, NXP adds a pure-play AI accelerator NPU (not an SoC) focused on transformer-based models. NXP can combine it with its existing MCU and MPUs including i.MX RT crossover MCUs and i.MX family application processors.
Multi-modal edge AI systems
Kinara’s AI accelerator NPU brings generative AI processing capabilities to the edge AI systems.
With Kinara, Vrijen said, “You can get a lot more context, interpret an image and classify what’s going on in the image and send that information to a large language model all in one chip in a multi-modal solution.”
Imagine having a set of security cameras running at the edge, not connected to the cloud.
Vrijen explained, “They collect images during the day. When you come back to your factory, you can literally ask them, what happened today?” This technology will highlight events that deviate from normal static images, interpret the events and report from a large language model (LLM) text output.
Before GenAI came along, Vrijen said, “there was this feeling that a lot of the AI exists in the cloud, and maybe you need good connectivity, and you send the intelligence over to the cloud. But now we're more and more convinced that we can do a lot more intelligence at the edge and in a more power-efficient way. We can do things much closer to what the customers want and protect their data with our security function capabilities”
Under the hood
So, what’s under the hood of Kinara’s AI Accelerator NPU? NXP won’t discuss Kinara’s specs, preferring that Kinara provides the latest information.
Last summer, Kinara revealed that its Ara-2 processor “hits 12 tokens per second running 7 billion parameter LLMs.”
But things are moving fast. Mekkoth said, “When we typically look at LLaVa or Llama, at 8 billion parameters, Kinara’s solutions are able to get up to 15, 16 tokens per second today. This is an optimization that’s continuing, which means there is lot of room to even further optimize with newer techniques.”
Without revealing how many cores are inside Kinara’s processor, Mekkoth said Kinara is using proprietary core designs. “It’s based on a set of multiple programmable cores.”
Along with programmability, Vrijen explained, Kinara provides “scalability that allows us to choose whether to use one, two or four cores.”
Handling multiple AI models in parallel on chip
NXP is especially pleased that Kinara’s processor is set up to handle multi-modal AI processing in parallel – with almost no delay.
This represents a radical departure from a traditional AI processor, in which inferences in multiple AI models are sequential. One model’s output is often the next model’s input, and models in an application often depend on each other and must be executed sequentially.
In contrast, said Mekkoth, Kinara has figured out “a very creative way to lay out these programmable cores in its processor architecture, which allows for parallel scheduling of compute within the models.” Hence, Kinara’s processor can provide “high-performance multi-model support with zero-overhead context switching,” according to Kinara’s own technology analysis.
What about memory?
Kinara’s document also says that “some competing accelerators require the entire set of model weights to be stored on-chip, because they lack external memory support.” The problem, as Kinara explained, is that supporting multiple models requires each model to be very small, or the accelerator must pull model weights from host memory and switch them out every time a new model is needed for inference. This results in significant model switch overhead.
Kinara’s NPU, on the other hand, uses a standard LPDDR interface to access external memory.
If all needed memory is not embedded on the same chip, is this a problem?
Mekkoth said, “No, they have a very efficient caching mechanism.” There is caching for each core, he explained, and a way of managing latency between the external memory and the process.
What about software?
As for training, the reason why the AI industry is beholden to Nivida is not Nvidia’s GPU, but rather CUDA, Nvidia’s parallel computing platform and programming model.
At NXP, Mekkoth explained, “we certainly don't use CUDA, and Kinara has their own tooling framework.”
Mekkoth praised Kinara’s tooling for its ability to run any model on the hardware without compelling the customer to deal with the model’s complexities.
Today, Kinara and NXP’s eIQ are different tool sets. But the plan is to integrate them. “We will provide a unified portal of eIQ which allows the application developer to have a window into the [Kinara’s] AI accelerator without having to deal with the complexities of the accelerator,” explained Mekkoth.
System’s view
In summary, NXP’s AI strategy boils down to a system strategy, said Vrijen.
The plan is not about burdening the system with additional, CPU or SoC capabilities, he stressed. “We want to help our customers by scaling AI in Lego blocks of NPU power.”
At bottom, he noted, “We have this universe of models, whether it's open models or models created by customers based on their proprietary data. Now customers are looking for a system where they can run that model at the edge. We will offer integrated solutions running our eIQ platform, talking to our host processors, transporting that AI model into the [Kinara’s] NPUs and run it there.”
NXP is buying Kinara for $307 million in cash, expecting the acquisition to be completed in the first half of this year.
Since the company has been already partnering with Kinara commercially, Vrijen said, “we have a good running start, and we are in discussions with our customers already for specific edge solutions where we can help them.”
-