AI chip

Most of the work done with generative AI models like ChatGPT happens in the clouds, but chip maker Qualcomm sees a future where more of it is done locally, on PCs and mobile devices.

Running generative AI locally will enable users to bring productivity, entertainment and other use cases with them on a range of compute devices, including extended reality and automobile systems, according to Ziad Asghar, senior vice president of product management, Snapdragon technologies and roadmaps at Qualcomm.

“That affords amazing benefits,” Asghar said in a recent interview with analyst firm Tirias Research. “You’re offloading all that work from the cloud, you have much more of a private experience, you can keep those queries on the device, and many other advantages as well. But the key point is that you get those experiences anywhere in the world. That’s how you get those generative AI experiences in the palm of people’s hands. And I think that will be very, very powerful.”

At its annual Snapdragon Summit this week, Qualcomm showed how it plans to make this happen. The company introduced its Arm-based Snapdragon X Elite chip for Microsoft Windows-based PCs and laptops and the Snapdragon Series 8 Gen 3 for high-end Android smartphones, both of which will enable users to run their AI workloads without having to connect to the internet.

A New Era of AI

“We are entering the era of AI, and on-device generative AI will play a critical role in delivering powerful, fast, personal, efficient, secure and highly optimized experiences,” Qualcomm CEO Cristiano Amon said. “Snapdragon is uniquely positioned to help shape and capitalize on the on-device AI opportunity and you will see generative AI going virtually everywhere that Snapdragon goes.”

Generative AI workloads require a huge amount of compute power, which can be found in cloud environments from the likes of Amazon Web Services, Microsoft Azure, Google Cloud Platform and Oracle Cloud Infrastructure.

However, there are worries about running all workloads in the cloud. Developers using ChatGPT to create code in the cloud risk exposing proprietary corporate data, while moving data back and forth from the cloud can be costly and adds to the amount of time needed to complete jobs.

Qualcomm executives said the X Elite includes a neural processing unit (NPU) and a custom integrated CPU – Oryon – that is twice as fast as Arm-based CPUs from rivals like Google and Samsung. The chip can run generated AI large-language models (LLMs) that have more than 13 parameters, and its peak performance is on par with Apple’s M2 and Intel’s 13980Hx; but it uses less power.

Like, the X Elite, the Snapdragon 8 Gen 3 mobile chip is designed for on-device AI with the ability to generate a Stable Diffusion AI image in less than a second, something that took the X Elite’s predecessor 15 seconds to do. It also is designed to run LLMs, including Meta’s Llama 2.

Qualcomm expects PCs with the new chip to roll out next year and that more than a dozen smartphone makers, including Asus, Sony, Xiamo, and ZTE will make devices powered by the Snapdragon 8 Gen 3 chip.

Growing Interest in AI on Devices

On-device AI is not a reach. Microsoft CEO Satya Nadella joined Qualcomm’s conference by video to say that the IT giant is working with Qualcomm to make Arm-based PCs with AI capabilities a reality.

Bob O’Donnell, chief analyst at TECHnalysis Research, wrote in a research note that the promised performance capabilities in Qualcomm’s chips will be welcome at a time when interest in running AI on a device is ramping, particularly generative AI. The security and privacy advancements it offers are important for such use cases as digital assistants.

“Plus, for certain applications that are trained specifically for PCs, many AI experts have said you’ll even be able to get better performance locally than from a cloud-based version,” O’Donnell wrote. “Of course, in some situations, there’s little doubt that the full power of cloud computing will be a better choice, but it’s definitely not always going to be the case – and that’s a big step forward in credibility for AI PCs.”

Qualcomm’s Asghar said during the interview with Tirias that when designing the new chips and looking down the road, the company considered the growth of LLMs, both in number and size. LLMs in the cloud are trained in 16- or 32-bit floating points, which means a lot of calculations every time a designer infers the model.

“Alternatively, we have been focused on compressing these models, quantizing and pruning them, and are able to make them much smaller and at the same time, for example, using only four bits” rather than 16 or 32, he said. “That translates into amazing capabilities in terms of power saving and how much concurrent AI processing can be done on the device. I feel the growth of large language models (LLMs) over the past few months has validated our strategy of essentially making the models smaller through quantizing them.”

Asghar also noted that as models get more capabilities, they get larger, pointing to GPT-3, which had 175 million parameters. Meta’s Llama 2 has 7 billion and 13 billion models, “so, the way I think about it is, initially, when a new capability is developed, the model is fairly large.”

A Crowded Field

Qualcomm is growing its capabilities in an expanding field of AI chip makers. NVIDIA almost a decade ago pointed to AI as the growth engine for the company going into the future and, since then, has built a product portfolio of GPUs, systems and the complex software, including the CUDA-X deep learning software stack and NVIDIA AI Enterprise platform, needed to run the workloads.

The global AI chip market is expected to grow from $20 billion last year to $165 billion by 2030, and, increasingly, chip makers – including Intel, AMD, OpenAI, Meta, Amazon, Google and Microsoft – are designing their own AI chips or have plans in place to both take advantage of the fast-growing market and to reduce their reliance on NVIDIA AI chips.