AI Personal Learning
and practical guidance
Beanbag Marscode1

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking

This article was updated on 2025-03-16 18:36, some content is time-sensitive, if it is invalid, please leave a message!

On March 16, Baidu officially released two new big models: Wenshin Big Model 4.5 and Wenshin Big Model X1. these two models have already been released in thein a wordThe official website is online and users can experience it for free. At the same time, Wenshin Big Model 4.5 has landed on the Baidu Intelligent Cloud Qianfan Big Model platform, which can be called by enterprise users and developers through APIs. Wenshin Big Model X1 will also be available on Chifan platform soon. In addition, Baidu Search, Wenshin Yiyin APP and other products will also be connected to these two new models, bringing more diversified experiences to users.

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking-1


 

Wenxin Big Model 4.5: Native Multimodal, More Comprehensive Capabilities

Wenshin Big Model 4.5 is a new generation of native multimodal base big model developed by Baidu. It achieves collaborative optimization through joint multimodal modeling and excels in multimodal comprehension capabilities. Compared with the previous version, Wenshin Big Model 4.5 has made significant improvements in language ability, comprehension, generation, logic, and memory, as well as in error message reduction, logical reasoning, and code ability.

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking-1

multimodal capability

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking-1

Text capabilities

Wenxin Big Model 4.5 is able to synthesize and understand text, pictures, audio, video and other forms of content. For example, when dealing with complex problems containing diagrams, it can accurately extract the key information in the diagrams and give detailed steps and analysis for solving the problems, and finally arrive at the correct answer.

 

In addition to "high IQ", Wenshin Big Model 4.5 also demonstrates "high EQ" in understanding internet terse pictures, satirical cartoons, etc. It can accurately capture the hidden messages and humorous elements in these contents and explain them in detail. It can accurately capture the hidden messages and humor elements in these contents and explain them in detail. For example, some "terrier pictures" contain the mathematical concept of "continuity is not necessarily derivable, and derivability is definitely continuous", and it can clearly explain the mathematical concepts and logic of the picture.

 

The enhanced capabilities of Wenshin Big Model 4.5 are due to the following key technologies:

  • FlashMask Dynamic Attention Mask: This technique accelerates attention mask computation for large models, improves long sequence modeling capabilities and training efficiency, and thus optimizes the model's performance in processing long text and multi-round conversations.
  • Multimodal heterogeneous expert extension techniques: By constructing heterogeneous experts for different modal characteristics and combining the adaptive modal perception loss function, we solve the problem of imbalance of different modal gradients and improve the multimodal fusion capability.
  • Spatio-temporal dimensional characterization compression techniques: This technique can efficiently compress the semantic representations of images and videos in the spatio-temporal dimension, dramatically improve the efficiency of multimodal data training, and enhance the ability of models to learn knowledge from long videos.
  • Large-scale data construction techniques based on knowledge points: Through the techniques of knowledge hierarchical sampling, data compression and fusion, and targeted synthesis of scarce knowledge points, high knowledge density pre-training data is constructed to improve the model learning efficiency and reduce the probability of the model producing wrong information.
  • Self-feedback based post-training techniques: A self-feedback iterative post-training technique incorporating multiple evaluation modalities comprehensively improves the stability and robustness of reinforcement learning, allowing pre-trained models to better align with human intentions.

Literary Mind Big Model X1: Deeper Thinking, More Comprehensive Capabilities

The Big Model X1 has enhanced understanding, planning, reflection and evolution capabilities and supports multimodality. It is the first deep thinking model that can utilize tools on its own. Wenshin Big Model X1 performs particularly well in Chinese knowledge quiz, literature creation, manuscript writing, daily conversation, logical reasoning, complex computation, and tool invocation.

Wenxin Big Model X1 already supports a variety of tools, including advanced search, document quiz, image comprehension, AI drawing, code interpreter, web page link reading, TreeMind tree map, Baidu Academic Search, business information query, joining information query, and so on.

For example, when generating a rewritten version of "Cold Kiln Fugue", Wenshin Big Model X1 shows a clear chain of thinking: first, find allusions to historical figures similar to the original text, then pay attention to the style and syntax, then check the appropriateness of the allusions, and finally, keep the structure of the text smooth, so as to generate a text that is basically the same as the original text in terms of intention and style and syntax.

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking-1

The enhanced capabilities of the Wenshin Grand Model X1 are due to several key technologies:

  • Progressive and intensive learning and training methods: This innovative approach comprehensively improves the integrated application of models in scenarios such as authoring, searching, tool invocation, and reasoning.
  • End-to-end training based on the chain of thought and action: For deep search, tool invocation and other scenarios, end-to-end model training is performed based on the result feedback, significantly improving the training effect.
  • Diverse and unified reward system: Establish a unified reward system that incorporates multiple types of reward mechanisms to provide more robust feedback for model training.

Prices and Outlook

Currently, users can experience Wuxin Big Model 4.5 and Wuxin Big Model X1 for free on WuxinYiYin official website, and on Baidu Intelligent Cloud Qianfan Big Model platform, the input price of Wuxin Big Model 4.5 API is as low as 0.004 yuan/thousand words, and the output price is as low as 0.016 yuan/thousand words. Wenshin Big Model X1 will also be launched on Chifan platform soon, with input price as low as 0.002 Yuan/thousand words and output price as low as 0.008 Yuan/thousand words.

Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking-1

Baidu said that 2024 is the year of full iteration of the big model technology, and Baidu will make bolder investments in artificial intelligence, data centers, and cloud infrastructure to build better and smarter next-generation models.

CDN1
May not be reproduced without permission:Chief AI Sharing Circle " Baidu Releases Wenxin Big Model 4.5 and X1: Dual Evolution of Multimodal Capabilities and Deep Thinking

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish