Microsoft Aligns with NVIDIA to Build LLMs

Microsoft and NVIDIA today revealed they are jointly working together to make it possible for data scientists to remotely build large language models on a Windows 11 PC or workstation without being connected to the Internet.

Announced at the Microsoft Build 2023 conference, NVIDIA is leveraging the Windows Subsystem for Linux (WSL) that Microsoft created to make tools for building artificial intelligence (AI) models on Linux systems now available on Windows 11.

Manuvir Das, vice president of enterprise computing for NVIDIA, said that approach enables data scientists to remotely take advantage of graphical processor units (GPUs) to build generative AI models. That capability eliminates the need to dual-boot systems to run both Windows and Linux to build those AI models. “They can run generative AI models on their laptops,” he said.

Those models can then be seamlessly integrated with AI models developed on NVIDIA GPUs running in the cloud or in a local data center, he added.

NVIDIA also plans to add support for Microsoft Olive toolchain for optimization and conversion of PyTorch models to ONNX, a set of open source libraries and runtimes made available via the Open Neural Network Exchange originally created by Microsoft and Meta. A set of Release 532.03 drivers will make it possible to employ a Stable Diffusion text-to-image generator that is optimized for Microsoft Olive toolchain.

Building generative AI models based on LLMs for specific domains requires much less data than the general-purpose model used to create, for example, ChatGPT. Data science teams are now also leveraging large LLMs to build medium-sized LLMs that will then be used to create small and even tiny LLMs that can be employed in use cases that have limited access to compute resources. NVIDIA in collaboration with Microsoft is looking to reduce the cost of building those models by making it possible to use a Windows system configured with a GPU.

NVIDIA feels generative AI has become the killer application for IT infrastructure than enables parallel processing, also known as accelerated computing. In effect, generative AI is reinventing what a computer is by making it possible to employ natural language to invoke applications rather than requiring a developer to write code to programmatically build an application.

It’s not clear how many LLMs might be built by the average enterprise but NVIDIA, in addition to needing GPUs to build them, is also making a case for using its GPUs to run the inference engines that AI models rely on to run. Most inference engines are run today on X86 systems, but NVIDIA is making a case for using parallel processing to both build AI models and run them.

One way or another, AI models are going to be pervasively deployed across enterprise applications. The challenge now is making it simpler to build those AI models and then integrate them within a wide range of applications. The issue, of course, is getting the data scientists that build those AI models aligned with the development teams that build and deploy those applications.