Gemini 2.0 released: a new AI model built for the age of intelligentsia

2.2K 00

A letter from Google and Alphabet CEO Sundar Pichai:

Information is at the heart of human progress. That's why we've been working for 26 years to organize the world's information and make it accessible and useful. It's also why we continue to push the frontiers of AI in order to organize information through a variety of inputs and make it more useful through any outputs that will actually help you.

That's what we launched last December. Gemini The vision at 1.0. Gemini 1.0 is the first native multimodal model that enables full understanding across text, video, images, audio, and code with multimodality and long contexts, and handles more information.

Today, millions of developers are building products using Gemini. It's helped us reimagine all of our products - including our seven core products with 2 billion users - and create new ones.NotebookLM is a great example of the multimodal and long-context capabilities and why it's so popular. NotebookLM is a great example of multimodal and long-context capabilities, and why it's so popular.

Over the past year, we've been working on developing models that are more agentic - models that can understand the world around you more deeply, think multiple steps ahead, and take action under your supervision.

Today, we're excited to unveil the next generation of models built for this new era of agents: the Gemini 2.0, our most powerful model to date. With new advances in the multimodal space (such as native image and audio output) and native tool usage capabilities, it will allow us to build new AI agents that are one step closer to realizing our vision of a universal assistant.

Today we're making 2.0 available to developers and trusted testers. We are accelerating its integration into our products, starting with Gemini and Search. Starting today, our Gemini 2.0 Flash experimental model will be available to all Gemini users. At the same time, we're launching a new program called Deep Research a new feature that leverages advanced reasoning and long context capabilities as a research assistant to help explore complex topics and summarize reports on your behalf. It is now live in Gemini Advanced.

No product has been more impacted by AI than search. Our AI Overview now reaches 1 billion users and is capable of answering a whole new set of question types - quickly becoming one of the most popular search features. Next, we're bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overview to handle more complex topics and multi-step questions, including advanced math equations, multimodal queries and coding. We began limited testing this week, with a broader rollout early next year. Over the next year, we will continue to introduce AI Overview in more countries and languages.

2.0 advances have been made possible by our investment in innovative full-stack approaches to AI for more than a decade. It's based on custom hardware like Trillium, our sixth-generation TPU. the TPU supports Gemini 2.0 training and inference on the 100%, and today Trillium is fully available for customers to build products with.

If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making information more useful. I can't wait to see what this new era will bring.

Announcing Gemini 2.0: a new AI model built for the age of agents

By Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind, on behalf of the Gemini Team

Over the past year, we've continued to make amazing progress in the field of artificial intelligence. Today, we launched the first model in the Gemini 2.0 family: an experimental version of Gemini 2.0 Flash. This is an efficient model at the forefront of our technology, with low latency and enhanced performance.

We also present a prototype of the frontiers of agent research supported by Gemini 2.0's native multimodal capabilities.

Gemini 2.0 Flash

Gemini 2.0 Flash builds on the success of 1.5 Flash, by far the most popular model among developers, delivering the same fast response times and enhanced performance. Remarkably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, being twice as fast. 2.0 Flash also brings new features. In addition to supporting multimodal inputs such as images, video, and audio, 2.0 Flash now supports multimodal outputs such as natively generated image-text mixing and controlled text-to-speech (TTS) multilingual audio. It can also natively invoke tools such as Google search, code execution, and third-party user-defined features.

Our goal is to enable users to work with our models safely and quickly. Over the past month, we've shared an early experimental version of Gemini 2.0 and received valuable feedback from developers.

Gemini 2.0 Flash is now available as an experimental model through the Google AI Studio cap (a poem) Vertex AI (used form a nominal expression) Gemini API Available to developers. Multimodal input and text output are available to all developers, while text-to-speech and native image generation capabilities are available to early access partners. General Availability will be released in January with additional model sizes.

To help developers build dynamic and interactive applications, we've also released a new real-time multimodal API that supports real-time audio and video streaming inputs as well as the use of a variety of combination tools. For more information about 2.0 Flash and the real-time multimodal API, see our Developer BlogThe

Gemini 2.0 is available in the Gemini app, our AI assistant.

Starting today, Gemini users around the world can access a chat-optimized version of the 2.0 Flash Experiment via the model drop-down menu on desktop and mobile web, which will soon be available in the Gemini mobile app. With this new model, users will be able to experience the Gemini Assistant in an even more useful way.

Early next year, we'll extend Gemini 2.0 to more Google products.

Unlocking the Agent Experience with Gemini 2.0

The native user interface action capabilities of Gemini 2.0 Flash, along with other improvements such as multimodal reasoning, long context understanding, complex instruction following and planning, combinatorial function calls, native tool usage, and improved latency, combine to enable a whole new class of agent experience.

The practical application of AI agents is an area of research full of exciting possibilities. We are exploring this new area with a series of prototypes that help people accomplish tasks and solve problems. These prototypes include an updated version of Project Astra, a research prototype exploring the future capabilities of general-purpose AI assistants; the newly launched Project Mariner, which explores the future of human-agent interactions, starting with the browser; and Jules, an AI-powered code agent that helps developers.

We're still in the early stages of development, but we're excited to see how trusted beta testers use these new features and what we can learn from them to make them available to more products in the future.

Project Astra: Multimodal Understanding Agents in the Real World

Since our I/O conference Release Project Astra Since then, we've been learning from trusted testers using Android phones. Their invaluable feedback has helped us better understand how generalized AI assistants work in practice, including the security and ethical implications.The latest version of Gemini 2.0 support improvements include:

Better conversational skills: Project Astra can now converse in multiple and mixed languages and better understand accents and rare vocabulary.
New tool utilization capacity: With Gemini 2.0, Project Astra has access to Google Search, Lens and Maps, making it even more useful in everyday life.
Better memory skills: We've improved Project Astra's memory capabilities while keeping you in control. It now supports up to 10 minutes of in-session memory and remembers more of your past conversations for greater personalization.
Improved latency: With new streaming capabilities and native audio comprehension, agents can understand language with a latency close to that of a human conversation.

We're working to bring these features to Google products, such as Gemini apps (our AI assistants), and in other forms such as glasses. At the same time, we're expanding our Trusted Tester program to more people, including a group that will soon begin testing Project Astra on prototype glasses.

Project Mariner: Intelligent Agents to Help with Complex Tasks

Project Mariner is an early research prototype built on Gemini 2.0 to explore the future of human-computer interaction, starting with your browser. As a research prototype, it understands and reasons about the information on your browser screen, including pixels and web page elements such as text, code, images, and forms, and uses that information to accomplish tasks for you through an experimental Chrome plugin.

exist WebVoyager BenchmarkingIn this test, which evaluates the performance of an intelligent agent in an end-to-end real-world web task, Project Mariner implements a single-agent configuration of the 83.51 Update on TP3T resultsThe

While still in its early stages, Project Mariner shows the technical feasibility of navigating in a browser, but the accuracy and speed of completing tasks is currently low and will improve rapidly in the future.

In order to build this project safely and responsibly, we are actively researching new types of risks and their mitigation methods, while maintaining human involvement. For example, Project Mariner can only type, scroll, or click in the active tab of a browser and request final confirmation from the user before performing certain sensitive actions, such as making a purchase.

Trusted testers have begun testing Project Mariner with an experimental Chrome plugin while we discuss it with the web ecosystem.

Jules: Intelligent Agents for Developers

Next, we're exploring how we can help developers with Jules, an experimental AI-powered code intelligence agent integrated directly into GitHub workflows that solves problems, creates plans, and executes them, all under the guidance and supervision of the developer. This work is part of our long-term goal to build AI agents that can help in all areas, including coding.

For more information on this ongoing experiment, see our Developer Blog PostsThe

Intelligent agents for games and other domains

Google DeepMind has a long history of using games to help AI models improve following rules, planning, and logic. For example, last week we launched Genie 2The Gemini 2.0 is an AI model capable of generating an infinite variety of playable 3D worlds from just a single image. Building on this legacy, we used Gemini 2.0 to build an intelligent agent that helps you navigate the virtual world of a video game. It can reason based solely on on-screen actions and provide suggestions for next steps through real-time dialog.

We're working with leading game developers like Supercell to test the ability of these agents to interpret rules and challenges across a diverse range of games, from strategy games like Clash of Clans to farm simulations like Hay Day.

In addition to serving as virtual gaming companions, these agents can utilize Google search to connect to the wealth of gaming knowledge on the web.

In addition to exploring the capabilities of intelligent agents in virtual worlds, we are also experimenting with ways to apply the spatial reasoning capabilities of Gemini 2.0 to the field of robotics. While still in the early stages, we are excited about the potential of intelligent agents in physical environments.

You can learn more about these research prototypes and experiments at labs.google.

Building Responsibly in the Age of Intelligent Agents

Gemini 2.0 Flash and our research prototypes allow us to test and iterate on new features in cutting-edge AI research that will ultimately make Google products more useful.

In developing these new technologies, we recognize their responsibilities and are concerned about the many issues that AI agents raise in terms of safety and security. As a result, we have taken an exploratory and incremental approach to development, working on multiple prototypes, iteratively implementing security training, collaborating with trusted testers and external experts, and conducting extensive risk assessments and safety and security evaluations.

Example:

As part of our safety process, we work with our Responsibility and Safety Committee (RSC), a permanent internal review group, to identify and understand potential risks.
Gemini 2.0's inference capabilities enable significant advances in our AI-assisted red team testing methodology, including evolving from just detecting risk to now being able to automatically generate evaluation and training data to mitigate risk. This means we can more efficiently optimize the safety of our models at scale.
As the multimodal nature of Gemini 2.0 increases the complexity of potential outputs, we will continue to evaluate and train models to process image and audio inputs and outputs to help improve security.
In Project Astra, we're exploring potential mitigations against users inadvertently sharing sensitive information with agents, and we've built in privacy controls so users can easily delete sessions. We're also continuing to look at ways to ensure that AI agents act as reliable sources of information and don't take unintended actions on behalf of users.
In Project Mariner, we are working to ensure that the model prioritizes following user instructions over third-party hint injection attempts, enabling it to identify potentially malicious instructions from external sources and prevent abuse. This prevents users from being exposed to fraud and phishing attacks due to malicious instructions hidden in emails, documents or websites.

We strongly believe that the only way to build AI is to be responsible from the start, and we will continue to prioritize security and responsibility as key elements of the model development process as we move forward with models and intelligent agents.

Gemini 2.0, Intelligent Agents and the Future

Today's release marks a new chapter in our Gemini model. With the release of Gemini 2.0 Flash and the launch of a series of research prototypes exploring agent possibilities, we have reached an exciting milestone in the Gemini era. We look forward to continuing to safely explore all new possibilities as we build our generalized artificial intelligence (AGI).