infrastructure, bottleneck, Growth, testing, AI testing tools, AI, AI regulation, generative AI, GenAI, AI regulation, AI growth, AGI, AI infrastructure

Apple and Microsoft this week added more fuel to the trend in AI to make models small enough to work locally on devices, rather than having to connect everything at the edge to large servers in cloud environments.

Microsoft unveiled its Phi-3 family of open small language models (SLMs) that are designed to run in environments with limited compute capacity – including on the devices themselves – and where latency is an issue. For its part, Apple rolled out its own open AI SLMs, which are small enough to run on devices like smartphones.

Both feed into the industry push to bring more AI training, processing and inference capabilities to the devices themselves; what’s called edge AI. Data is increasingly being created and stored in edge devices, which can be defined as everything from tiny sensors to PCs. Gartner analysts have predicted that by next year, as much as 75% of enterprise-created data will be generated at the edge.

Everything from PCs to smartphones are being made more AI-enabled. Typically, AI data created at the edge is sent to the enterprise servers in the cloud, which can be expensive and increase latency in analyzing and acting on the data.

Chip makers like Intel, AMD, Nvidia, Arm and Qualcomm are building processors that can run AI workloads at the edge, bringing more compute capabilities to the device. Companies like Microsoft and Apple are working on the software side.

Microsoft and Phi-3

“Thanks to their smaller size, Phi-3 models can be used in compute-limited inference environments,” Misha Bilenko, corporate vice president for Microsoft GenAI, wrote in a blog post. “Phi-3-mini, in particular, can be used on-device, especially when further optimized with ONNX Runtime for cross-platform availability.”

The smaller size of the models also means it’s easier to fine-tune and customize them, makes them more affordable to run and delivers better latency.

“The longer context window enables taking in and reasoning over large text content – documents, web pages, code and more,” Bilenko wrote. “Phi-3-mini demonstrates strong reasoning and logic capabilities, making it a good candidate for analytical tasks.”

Phi-3-mini Appears First

The first of the SLM models to be available is the Phi-3-mini, which is available in Microsoft’s Azure AI Model Catalog and Hugging Face’s platform of machine learning models. It also can be found on Ollama, a framework developers can use to run locally on their laptops, and will be available as an Nvidia NIM microservice, with a standard API interface, according to Microsoft.

Phi-3-mini measures 3.8 billion parameters, which Bilenko wrote enables it to perform better than many models twice its size. It’s also available in 4K and 128K token context lengths, the first SLM in its class to support a context window of up to 128K tokens, he wrote.

“In the coming weeks, additional models will be added to the Phi-3 family to offer customers even more flexibility across the quality-cost curve,” Bilenko wrote. “Phi-3-small (7B [parameters]) and Phi-3-medium (14B) will be available in the Azure AI model catalog and other model gardens shortly.”

Apple and OpenELM

Apple’s eight SLMs that come under the OpenELM – or Open-source Efficient Language Models – umbrella are smaller than Microsoft’s Phi-3-mini, with parameters ranging from 270 million to 3 billion. They also come in two modes. Four of them are pretrained and the other four are instruction-tuned, which means they can better follow instructions in general and reduce the amount of in-context information, making prompts more effective, according to IBM.

The models were trained on CoreNet library – for which Apple also released the code – and trained it – using datasets in RefinedWeb, deduplicated PILE, and subsets of RedPajama and Dolma v1.6, all totaling about 1.8 trillion tokens. The models are available on Hugging Face.

“OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy,” Apple wrote.

The company said it was releasing the OpenELM modes to give researchers access to its latest language models. Apple also cautioned that the SLMs were trained on publicly available datasets.

“These models are made available without any safety guarantees,” the vendor wrote. “Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased or objectionable in response to user prompts.”

Users and developers need to be thorough in their testing and use filtering mechanisms that are tailed to their specific requirements to ensure that such issues don’t make it into their code.

Apple also took another step into edge AI, reportedly buying Datakalab, a Paris-based company focused on making efficient AI algorithms that can be used for embedded systems and devices like smartphones. The vendor bought the company in December, but the deal wasn’t discovered until recently by French publication Challenges.

Apple has been quietly building up its AI capabilities in part via acquisitions, some of which the company hasn’t announced publicly. In March it was revealed that the vendor bought Canadian startup Darwin AI the month before. A company spokesperson has said that Apple, at times, will buy smaller companies but not discuss its plans for them.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Qlik Tech Field Day Showcase

TECHSTRONG AI PODCAST

SHARE THIS STORY