Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking

50.1K 00

On March 16, Baidu officially released two new big models: Wenshin Big Model 4.5 and Wenshin Big Model X1. these two models have already been released in thein a wordThe official website is online and users can experience it for free. At the same time, Wenshin Big Model 4.5 has landed on the Baidu Intelligent Cloud Qianfan Big Model platform, which can be called by enterprise users and developers through APIs. Wenshin Big Model X1 will also be available on Chifan platform soon. In addition, Baidu Search, Wenshin Yiyin APP and other products will also be connected to these two new models, bringing more diversified experiences to users.

Wenxin Big Model 4.5: Native Multimodal, More Comprehensive Capabilities

Wenshin Big Model 4.5 is a new generation of native multimodal base big model developed by Baidu. It achieves collaborative optimization through joint multimodal modeling and excels in multimodal comprehension capabilities. Compared with the previous version, Wenshin Big Model 4.5 has made significant improvements in language ability, comprehension, generation, logic, and memory, as well as in error message reduction, logical reasoning, and code ability.

multimodal capability

Text capabilities

Wenxin Big Model 4.5 is able to synthesize and understand text, pictures, audio, video and other forms of content. For example, when dealing with complex problems containing diagrams, it can accurately extract the key information in the diagrams and give detailed steps and analysis for solving the problems, and finally arrive at the correct answer.

In addition to "high IQ", Wenshin Big Model 4.5 also demonstrates "high EQ" in understanding internet terse pictures, satirical cartoons, etc. It can accurately capture the hidden messages and humorous elements in these contents and explain them in detail. It can accurately capture the hidden messages and humor elements in these contents and explain them in detail. For example, some "terrier pictures" contain the mathematical concept of "continuity is not necessarily derivable, and derivability is definitely continuous", and it can clearly explain the mathematical concepts and logic of the picture.

The enhanced capabilities of Wenshin Big Model 4.5 are due to the following key technologies:

FlashMask Dynamic Attention Mask: This technique accelerates attention mask computation for large models, improves long sequence modeling capabilities and training efficiency, and thus optimizes the model's performance in processing long text and multi-round conversations.
Multimodal heterogeneous expert extension techniques: By constructing heterogeneous experts for different modal characteristics and combining the adaptive modal perception loss function, we solve the problem of imbalance of different modal gradients and improve the multimodal fusion capability.
Spatio-temporal dimensional characterization compression techniques: This technique can efficiently compress the semantic representations of images and videos in the spatio-temporal dimension, dramatically improve the efficiency of multimodal data training, and enhance the ability of models to learn knowledge from long videos.
Large-scale data construction techniques based on knowledge points: Through the techniques of knowledge hierarchical sampling, data compression and fusion, and targeted synthesis of scarce knowledge points, high knowledge density pre-training data is constructed to improve the model learning efficiency and reduce the probability of the model producing wrong information.
Self-feedback based post-training techniques: A self-feedback iterative post-training technique incorporating multiple evaluation modalities comprehensively improves the stability and robustness of reinforcement learning, allowing pre-trained models to better align with human intentions.

Literary Mind Big Model X1: Deeper Thinking, More Comprehensive Capabilities

The Big Model X1 has enhanced understanding, planning, reflection and evolution capabilities and supports multimodality. It is the first deep thinking model that can utilize tools on its own. Wenshin Big Model X1 performs particularly well in Chinese knowledge quiz, literature creation, manuscript writing, daily conversation, logical reasoning, complex computation, and tool invocation.

Wenxin Big Model X1 already supports a variety of tools, including advanced search, document quiz, image comprehension, AI drawing, code interpreter, web page link reading, TreeMind tree map, Baidu Academic Search, business information query, joining information query, and so on.

For example, when generating a rewritten version of "Cold Kiln Fugue", Wenshin Big Model X1 shows a clear chain of thinking: first, find allusions to historical figures similar to the original text, then pay attention to the style and syntax, then check the appropriateness of the allusions, and finally, keep the structure of the text smooth, so as to generate a text that is basically the same as the original text in terms of intention and style and syntax.

The enhanced capabilities of the Wenshin Grand Model X1 are due to several key technologies:

Progressive and intensive learning and training methods: This innovative approach comprehensively improves the integrated application of models in scenarios such as authoring, searching, tool invocation, and reasoning.
End-to-end training based on the chain of thought and action: For deep search, tool invocation and other scenarios, end-to-end model training is performed based on the result feedback, significantly improving the training effect.
Diverse and unified reward system: Establish a unified reward system that incorporates multiple types of reward mechanisms to provide more robust feedback for model training.

Prices and Outlook

Currently, users can experience Wuxin Big Model 4.5 and Wuxin Big Model X1 for free on WuxinYiYin official website, and on Baidu Intelligent Cloud Qianfan Big Model platform, the input price of Wuxin Big Model 4.5 API is as low as 0.004 yuan/thousand words, and the output price is as low as 0.016 yuan/thousand words. Wenshin Big Model X1 will also be launched on Chifan platform soon, with input price as low as 0.002 Yuan/thousand words and output price as low as 0.008 Yuan/thousand words.

Baidu said that 2024 is the year of full iteration of the big model technology, and Baidu will make bolder investments in artificial intelligence, data centers, and cloud infrastructure to build better and smarter next-generation models.

AI News

Article copyright AI Sharing Circle All, please do not reproduce without permission.

LangGraph 0.3 发布 - 带来“开箱即用”的 AI Agent 模板

LangGraph 0.3 Released - Bringing AI Agent Templates Out-of-the-Box

AI News

1yrs ago

048.3K

Recommended AI tools for free and unlimited use

AI News

1yrs ago

086.3K

Basic universalization of AI education in primary and secondary schools by 2030? How does the education sector see this

AI News

1yrs ago

046.2K

告别代码焦虑，拥抱开发快感：Trae AI 助你轻松构建应用，人人皆可成为开发者

Say goodbye to code anxiety and embrace the thrill of development: Trae AI makes it easy to build apps, and everyone can be a developer!

AI News

1yrs ago

061K

No comments

You must be logged in to leave a comment!

No comments...

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking

Wenxin Big Model 4.5: Native Multimodal, More Comprehensive Capabilities

Literary Mind Big Model X1: Deeper Thinking, More Comprehensive Capabilities

Prices and Outlook

AI "scientists" win their first battle: paper passes ICLR peer review, human research status challenged?

Google Gemini launches personalized search feature with deep search history integration

Related posts

LangGraph 0.3 Released - Bringing AI Agent Templates Out-of-the-Box

Recommended AI tools for free and unlimited use

Basic universalization of AI education in primary and secondary schools by 2030? How does the education sector see this

Say goodbye to code anxiety and embrace the thrill of development: Trae AI makes it easy to build apps, and everyone can be a developer!

No comments

Latest Collections

Latest Articles

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking

Wenxin Big Model 4.5: Native Multimodal, More Comprehensive Capabilities

Literary Mind Big Model X1: Deeper Thinking, More Comprehensive Capabilities

Prices and Outlook

AI "scientists" win their first battle: paper passes ICLR peer review, human research status challenged?

Google Gemini launches personalized search feature with deep search history integration

Related posts

LangGraph 0.3 Released - Bringing AI Agent Templates Out-of-the-Box

Recommended AI tools for free and unlimited use

Basic universalization of AI education in primary and secondary schools by 2030? How does the education sector see this

Say goodbye to code anxiety and embrace the thrill of development: Trae AI makes it easy to build apps, and everyone can be a developer!

No comments

Selected AI Tools

Latest Collections

Latest Articles