Google this week extended its portfolio of artificial intelligence (AI) tools and platforms to include a prototype of a universal speech and hearing enabling agent along with a public preview of a lighter-weight instance of its Gemini Flash 1.5 large language model (LLM) that enterprise IT organizations will be able to deploy and extend themselves more easily.
Announced at the Google I/O conference, Google also said the private view of a two million token context window to enable end users to launch more complex prompts is now also available. The current context window provided by Google is one million tokens, which the company claims is already the largest available.
Google is also previewing Gemma 2, a series of LLMs that now includes a PaLi-3 vision-language model. That series of LLMs will be capable of extending up to 27 billion parameters to process more complex prompts. In addition, Google is adding an AI assisted Red Team tool that surfaces threats to AI models and has revamped its Responsible Generative AI Toolkit with LLM Comparator for evaluating the quality of model responses to make it simpler to use.
Other AI advancements revealed at the conference include a revamped AI Overviews capability that, for example, an end user can use to launch prompts that provide more context about a photo using an Ask Photos tool. Google is also moving to embed Gemini LLMs into the Android operating system.
Google is also now, via Gemini 1.5 Pro edition of its LLMs, making it possible to use source materials to generate a personalized and interactive audio conversation.
Finally, Google announced that late this year it will deliver its 6th generation of TPUs, called Trillium, to provide a 4.7x improvement in compute performance per chip over the previous generation of its TPUs. In addition, Google said it will make available Blackwell graphical processor units (GPUs) from NVIDIA in early 2025.
Project Astra is an ambitious effort to create a universal agent that can be used to process requests ranging from helping someone find their glasses by searching recent videos to debugging code. Similar in concept to a copilot, the goal is to provide a single AI agent capable of handing a wide range of tasks that reduces the need for an end user to orchestrate many agents that have been optimized for specific tasks.
It’s not clear at any given point which vendors will be providing the most robust portfolio of LLMs, but each organization will need to determine based on cost which LLM to apply to any given use case. Gemini Flash 1.5, for example, might provide a significantly less costly option than advanced Gemma 2 LLMs that will require more processing power to train and run.
Similarly, IT teams should assume that Google rivals are also racing to build universal agents enabled with speech and hearing.
Regardless of which path an organization chooses to move forward on, the one thing that is certain is the AI capabilities that have astounded everyone in the past two years will soon seem comparatively trivial to what will soon become possible.