At its recent Cloud Next 2023 conference, Google Cloud unveiled significant updates across its AI and infrastructure portfolio to help customers build and run large language models (LLMs) at scale. I had the chance to attend the event and learn more about these new capabilities.
Generating AI models like PaLM, Imagen, and others built on LLMs are extremely computationally intensive. To support customers using these models in production, Google Cloud is enhancing its infrastructure, tools and platform.
On the infrastructure side, Google announced the general availability of its Cloud TPU v5 chips, which are purpose-built to run AI workloads efficiently. Compared to the prior TPU v4, v5 offers two times better performance per dollar for training and 2.5 times for inference. In addition, A3 virtual machines powered by Nvidia’s new H100 GPUs will soon launch, providing up to three times faster training than the previous generation.
Then, to scale up ML workloads, Google launched GKE Enterprise for multi-cluster management and autoscaling of Cloud TPU pods. With this announcement, Google reported that some early customers have seen productivity gains of 45% and software deployment acceleration of over 70% with GKE Enterprise.
In addition to new hardware, Google is advancing its Vertex AI platform for building, deploying and managing models. Vertex AI now offers over 100 pre-built foundation models optimized for different applications in language, vision and more. However, first-party models like PaLM 2, Imagen and Codey, were updated to 32k context windows to allow for the processing of long-form documents, with Google estimating up to 80 pages.
Google also highlighted how Vertex AI helps enterprises customize models like PaLM and Imagen using simple tools like prompt tuning and style tuning, opening the door for creating adapted models tailored to specific domains and use cases without requiring large training datasets.
Switching gears to the operational side, Google announced that Vertex AI Search and Conversation APIs are now generally available to build search and chatbot applications harnessing LLMs with just a few lines of code. The platform also includes capabilities like digital watermarking and tools to control access and protect data privacy.
In addition, Google discussed updates to Duet AI, its AI assistant designed to make AI more accessible across Workspace productivity tools and Google Cloud developer services. Among new Duet AI capabilities, including meeting summarization in Google Meet, enhanced chat in Google Chat and data analytics assistance in BigQuery, Google announced the general availability of Duet AI for Google Workspace and offered an expanded preview of Duet AI for Google Cloud with features such as context aware code generation customized with enterprise knowledge, a database migration service and code refactoring.
Looking ahead, Google’s goal seems to be to provide a comprehensive AI platform enabling the development and responsible adoption of generative AI across products and industries. This includes continuing investments in its Vertex AI platform to build and manage powerful models that understand context, integrate helpful AI capabilities into Workspace and Cloud and work with developers and companies on industry-specific AI solutions. With this strategy, Google hopes to lead the next phase of digital transformation underpinned by advancements in models, data and infrastructure – while upholding principles to keep AI safe, inclusive and beneficial.