Large language models (LLMs) have garnered much of the attention over the past year for their foundational role for such high-profile generative AI tools like OpenAI’s ChatGPT and Google’s Bard chatbots.

LLMs have rapidly grown in size, with some containing hundreds of billions of parameters – variables that plays a crucial role in determining a model’s performance – that can train on massive datasets, and include such models as OpenAI’s GPT-4, Google’s BERT family and Lambda, Anthropic’s Claude, Meta’s Llama-2 and Microsoft’s Orca, which includes 13 billion parameters.

However, despite LLMs gaining the lion’s share of attention, a lot of work is being done around small language models (SMLs), which train on smaller datasets, have fewer parameters, can be used for more domain-specific cases and offer the promise of being able to run on smaller systems.

Google, earlier this month, ran out its Gemini family of models that includes Gemini Nano, which has either 1.8 billion parameters (for Nano-1) or 3.25 billion (Nano-2). The models are designed to run on smaller devices like Google’s Pixel 8 smartphones.

Microsoft and Phi

For its part, Microsoft over the past several months has unveiled several SLMs under the “Phi” brand, including Phi-1 (1.3 billion parameters) and Phi-1.5, which also has 1.3 billion parameters but with a focus on “common sense reasoning and language understanding” and can perform as well as models that are five-times larger, according to Microsoft Research.

The IT giant this week released Phi-2, a 2.7 billion-parameter SLM that the company says can match or outperform models that are 25-times larger, due to innovations around model scaling and training data curation. That includes models like Mistral, with 7 billion parameters, and Llama-2, with 13 billion, as well as the 70 billion-parameter Llama-2 on such “multi-step reasoning tasks” as coding and math.

It also matches or outperforms Google’s Gemini Nano-2.

“Phi-2 outperforms other existing small language models, yet it’s small enough to run on a laptop or mobile device,” Microsoft Research wrote on X (formerly Twitter).

The smaller model fits in with a push throughout the industry to find ways to make the vast capabilities of generative AI that are now mostly in larger systems or in the cloud available locally on PCs or other mobile devices.

Lots of Capabilities in a Small Package

The rapid growth “in the size of language models to hundreds of billions of parameters has unlocked a host of emerging capabilities that have redefined the landscape of natural language processing,” Senior Researcher Mojan Javaheripi and Sébastien Bubeck, partner research manager, with Microsoft Research wrote in a blog post. “A question remains whether such emergent abilities can be achieved at a smaller scale using strategic choices for training, e.g., data selection.”

Microsoft looked at two areas when working to scale the capabilities of SLMs, including the training data, Javaheripi and Bubeck wrote. The company used a mixed that included synthetic datasets designed to teach the model such concepts as common-sense reasoning and general knowledge in such areas as science, daily activities, and theory of mind. That was augmented by “carefully selected web data that is filtered based on educational value and content quality,” they wrote.

The researchers also embedded the knowledge of the smaller Phi-1.5 model in Phi-2 to help scale the smaller model’s capabilities.

“This scaled knowledge transfer not only accelerates training convergence but shows clear boost in Phi-2 benchmark scores,” they wrote.

A Lot of GPUs and Tokens for Training

Training for Phi-2 was run on 96 Nvidia A100 GPUs and took 14 days, using 1.4 trillion tokens – units of text or code – from multiple passes on a mixture of synthetic and web datasets for both natural language processing (NLP) and coding.

What the model didn’t need was such techniques as reinforcement learning based on human feedback or fine tuning of instructions, which are common in model training. Still, Phi-2 exhibited better behavior when it came to toxicity and bias than open-source models that went through the alignment.

“With its compact size, Phi-2 is an ideal playground for researchers, including for exploration around mechanistic interpretability, safety improvements, or fine-tuning experimentation on a variety of tasks,” the researchers wrote, adding that the SLM will be available in the Azure AI Studio model catalog R&D purposes.

Phi-2, which initially was introduced last month at Microsoft’s Ignite event, and the larger trend toward smaller language models it getting some positive feedback. One person on Reddit noted that, “After so many mistakes and opportunities/markets lost for Microsoft, they seem to be on the right track for AI/LLMs. Small LLMs that can run with consumer hardware are going to be massively important.”

Another on the same string noted that they could run Phi-2 on their laptop, making it “suitable for a locally running copilot, line / block completion, and other simple tasks you want done fast and often. It would also work offline. I see great future in small models, especially if they get trained for specific purposes.”