Gemini API Introduces New Text Embedding Model: Soaring Performance, 8K Input Support

AI News5mos agorelease AI Sharing Circle

1.3K 00

Image by Google Gemini 2.0 Flash Generation

Recently, Google has been Gemini API A new experimental text embedding model has been introduced in the gemini-embedding-exp-03-07^[1]The model is trained based on the Gemini model. The model is trained based on the Gemini model, inheriting Gemini's deep understanding of language and subtle contexts, and is applicable to a wide range of scenarios. It's worth noting that this new model surpasses Google's previously released text-embedding-004 model and topped the Multilingual Text Embedding Benchmark (MTEB) charts, while also delivering longer input token length and other new features.

commentaries
Considering that there are already some open-source embedding models in the market, such as multilingual-e5-large-instruct, although they may be slightly inferior to Gemini's new model in terms of performance, the open-source model may still be competitive in certain scenarios, such as small text block processing and cost-sensitive applications. Therefore, the future market acceptance of Gemini's new model, in addition to its superior performance, will ultimately depend on whether its pricing strategy and usage limitations meet the needs of developers.

Comprehensive leading text embedding model

Google says the new model has been specifically trained for excellent generalizability, with excellent performance in finance, science, law, search, and many other areas, and can be used directly without a lot of fine-tuning for specific tasks.

In the ranking of multilingual MTEBs.gemini-embedding-exp-03-07 The average task score reaches 68.32, which is 5.81 points higher than the second-ranked model.The MTEB ranking is an important reference for model comparison as it provides a comprehensive evaluation of the performance of text embedding models on a variety of tasks such as retrieval and categorization.

Why choose text embedding?

Enhanced generation from building intelligent searches (RAG) and recommender systems, to text categorization, the ability of large-scale language models (LLMs) to understand the meaning behind text is critical. Compared to keyword matching systems, embedding techniques can often build more efficient systems, reducing cost and latency while providing better results.

Embedding techniques capture semantics and context through numerical representation of data. Data with similar semantics have closer embedding vectors. Embedding technology supports a variety of applications including:

Efficient retrieval: Finding relevant documents in large databases, such as legal document retrieval or enterprise search, by comparing the query with the embedding vectors of the documents.
Retrieval Augmented Generation (RAG): Improve the quality and relevance of generated text by retrieving and integrating relevant information into the model context.
Clustering and Classification: Group similar text to identify trends and themes in the data.
Classification: Automatic categorization based on text content, e.g. sentiment analysis or spam detection.
Text similarity: Identify duplicate content and achieve tasks such as web page de-duplication or plagiarism detection.

To learn more about embedding and common AI use cases, check out the Gemini API DocumentationThe

Experience Gemini Text Embedding Now

Developers can now use this new experimental text embedding model through the Gemini API. It is similar to the existing embed_content Interface compatibility.

from google import genai
client = genai.Client(api_key="GEMINI_API_KEY")
result = client.models.embed_content(
model="gemini-embedding-exp-03-07",
contents="阿尔法折叠是如何工作的？",
)
print(result.embeddings)

In addition to improved quality in all aspects ofgemini-embedding-exp-03-07 It also has the following characteristics:

8K token input limit: Compared to previous models, Google has improved the context length to allow embedding of larger chunks of text, code, or other data.
3072 dimensional output: High-dimensional embedding vectors with nearly 4 times more tokens than previous embedding models.
Matryoshka Representation Learning (MRL): MRL allows developers to truncate the original 3072-dimensional vector to reduce storage costs. Simply put, MRL technology allows users to sacrifice a portion of precision in exchange for storage space savings.
Extended language support: The number of supported languages has doubled to over 100.
Harmonization model: The model surpasses in quality Google's previously released task-specific, multilingual, plain English and code-specific models.

Although currently in an experimental phase with limited capacity, this release provides developers with the opportunity to explore early gemini-embedding-exp-03-07 Capability Opportunities. As with all experimental models, it's subject to change.Google says it's working toward a stable and generally available version in the coming months.

Google is encouraging developers to make their own customizations through the Embedded Feedback Form Provide feedback.

Some users have noted that the model is free during preview, but there are strictspeed limit--5 requests per minute, 100 requests per day. Developers can easily trigger these limits when testing models. Some users have expressed hope that Google will raise these limits soon.

In the Reddit discussion, many users expressed excitement about the release of the new model, describing it as "a bigger deal than people realize". One user commented, "3k dimensional fp32 embedding vectors are huge. I bet you could build a very reasonable decoder with that much data ...... If this model was cheap, I'd probably use it more often than a full-blown large-scale language model. Usually, semantic feature extraction is what you really want."

Another user noted that this model "doesn't have much competition" in the multilingual domain, adding that it is probably best suited for larger blocks of text due to the rate limitation and the dimensionality of the embedding.

[1]: on Vertex AI, the same model is passed through the text-embedding-large-exp-03-07 interface is provided. The naming will be consistent at the time of official release.