Take a tour of Gemini 2.0 Flash's native image generation and editing capabilities.

AI hands-on tutorials5mos agoupdate AI Sharing Circle

In December of last year, Gemini 2.0 Flash showed its native image output capabilities for the first time to a select group of beta testers. Currently, developers can add a new version of Gemini 2.0 Flash to the Google AI Studio Experience this new feature in all supported regions. Developers can access this new feature via Google AI Studio (an experimental version of gemini-2.0-flash-exp) and the Gemini API to test this new feature.

Gemini 2.0 Flash utilizes multimodal input, enhanced reasoning capabilities and natural language understanding to generate images. This technology combines a number of advanced capabilities that make Gemini 2.0 Flash uniquely suited for image generation.

Experience: https://aistudio.google.com/prompts/new_chat (Select: Gemini 2.0 Flash Experimental)

Here are some examples of highlights of Gemini 2.0 Flash multimodal output:

1. Combining text and graphics: unifying storytelling and visual presentation

Gemini 2.0 Flash generates images based on the textual story and maintains character and scene consistency throughout the storytelling process. Further, the user can provide feedback, and the model can adjust the story content or image style based on the feedback to synchronize the evolution of the story and illustrations.

Cue word: Generate a story about tadpoles looking for their mothers, the story is divided into 3 images to be told, first generate the pictures of the three images individually, then generate the story text corresponding to all the images.

Even if you don't specify the screen style, it will remain uniform.

2. Conversational image editing: natural language-driven iterative optimization

Gemini 2.0 Flash supports image editing through multiple rounds of natural language dialog. This facilitates users to iteratively optimize an image or explore different creative directions together. The model is able to maintain contextual understanding during the dialog, gradually adjusting the image according to the user's instructions until the desired result is achieved.

The text-only prompts to edit the image, with no change in detail other than color, really did what it said on the tin this time!

3. Integration of world knowledge: creating more accurate images

Unlike other image generation models, Gemini 2.0 Flash utilizes its powerful world knowledge and reasoning capabilities to generate more accurate images. This makes it excellent for creating images that require a high degree of realism, such as images used to illustrate a recipe. Although Gemini 2.0 Flash strives for accuracy, as with all language models, its knowledge is broad and generalized, not absolutely complete. This means that the model may have limitations in terms of domain-specific expertise.

Prompt word: Help me generate a Mexican restaurant recipe in text + image format

4. Text rendering capability: accurate rendering of long texts

Most image generation models struggle with accurately rendering long text sequences, often with problems such as miss-formatting, illegible characters, or misspellings. Internal reviews show that Gemini 2.0 Flash outperforms other leading models in text rendering. This makes it ideal for creating image content such as advertisements, social media posts, and even invitations that need to contain a lot of text.

Clue: An old newspaper with the headline "Today's Hot News" written on top and the specifics of the news underneath.

Chinese is slightly worse, output long English text is better.

Full English effect?

More surprising examples of image editing

Portrait Picture Face Swap

Just kidding...

Facial expression layout fine-tuning

Multi-photo element compositing

Upload two photos of the characters, the first one was chosen to be a bust of Musk and the second chapter was chosen to be a full-body portrait of a beautiful woman to be composited. There is a lot of room for imagination with this play.

Restoration of old photographs

If you can't fix it well once, you can try several times while the photo details are enlarged.

Picture coloring

And of course it supports the coloring of old photos

From Logo Style Conversion to Finished Print Showcase

Experience Gemini Image Generation Now

Developers can use the Gemini API Getting Started with Gemini 2.0 Flash For more information on image generation, please refer to the(computer) fileThe

from google import genai
from google.genai import types
client = genai.Client(api_key="GEMINI_API_KEY")
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents=(
"Generate a story about a cute baby turtle in a 3d digital art style. "
"For each scene, generate an image."
),
config=types.GenerateContentConfig(
response_modalities=["Text", "Image"]
),
)

Whether it's building AI agents, developing applications with beautiful visuals like interactive storytelling, or ideating visual ideas in conversations, Gemini 2.0 Flash enables developers to generate both text and images from a single model. Google looks forward to seeing developers create more apps utilizing the native image output capabilities, and would like feedback from developers to help the Gemini team complete the production-ready version as soon as possible.