On March 16, Baidu officially released two new big models: Wenshin Big Model 4.5 and Wenshin Big Model X1. these two models have already been released in thein a wordThe official website is online and users can experience it for free. At the same time, Wenshin Big Model 4.5 has landed on the Baidu Intelligent Cloud Qianfan Big Model platform, which can be called by enterprise users and developers through APIs. Wenshin Big Model X1 will also be available on Chifan platform soon. In addition, Baidu Search, Wenshin Yiyin APP and other products will also be connected to these two new models, bringing more diversified experiences to users.
Wenxin Big Model 4.5: Native Multimodal, More Comprehensive Capabilities
Wenshin Big Model 4.5 is a new generation of native multimodal base big model developed by Baidu. It achieves collaborative optimization through joint multimodal modeling and excels in multimodal comprehension capabilities. Compared with the previous version, Wenshin Big Model 4.5 has made significant improvements in language ability, comprehension, generation, logic, and memory, as well as in error message reduction, logical reasoning, and code ability.
multimodal capability
Text capabilities
Wenxin Big Model 4.5 is able to synthesize and understand text, pictures, audio, video and other forms of content. For example, when dealing with complex problems containing diagrams, it can accurately extract the key information in the diagrams and give detailed steps and analysis for solving the problems, and finally arrive at the correct answer.
In addition to "high IQ", Wenshin Big Model 4.5 also demonstrates "high EQ" in understanding internet terse pictures, satirical cartoons, etc. It can accurately capture the hidden messages and humorous elements in these contents and explain them in detail. It can accurately capture the hidden messages and humor elements in these contents and explain them in detail. For example, some "terrier pictures" contain the mathematical concept of "continuity is not necessarily derivable, and derivability is definitely continuous", and it can clearly explain the mathematical concepts and logic of the picture.
The enhanced capabilities of Wenshin Big Model 4.5 are due to the following key technologies:
- FlashMask Dynamic Attention Mask: This technique accelerates attention mask computation for large models, improves long sequence modeling capabilities and training efficiency, and thus optimizes the model's performance in processing long text and multi-round conversations.
- Multimodal heterogeneous expert extension techniques: By constructing heterogeneous experts for different modal characteristics and combining the adaptive modal perception loss function, we solve the problem of imbalance of different modal gradients and improve the multimodal fusion capability.
- Spatio-temporal dimensional characterization compression techniques: This technique can efficiently compress the semantic representations of images and videos in the spatio-temporal dimension, dramatically improve the efficiency of multimodal data training, and enhance the ability of models to learn knowledge from long videos.
- Large-scale data construction techniques based on knowledge points: Through the techniques of knowledge hierarchical sampling, data compression and fusion, and targeted synthesis of scarce knowledge points, high knowledge density pre-training data is constructed to improve the model learning efficiency and reduce the probability of the model producing wrong information.
- Self-feedback based post-training techniques: A self-feedback iterative post-training technique incorporating multiple evaluation modalities comprehensively improves the stability and robustness of reinforcement learning, allowing pre-trained models to better align with human intentions.
Literary Mind Big Model X1: Deeper Thinking, More Comprehensive Capabilities
The Big Model X1 has enhanced understanding, planning, reflection and evolution capabilities and supports multimodality. It is the first deep thinking model that can utilize tools on its own. Wenshin Big Model X1 performs particularly well in Chinese knowledge quiz, literature creation, manuscript writing, daily conversation, logical reasoning, complex computation, and tool invocation.
Wenxin Big Model X1 already supports a variety of tools, including advanced search, document quiz, image comprehension, AI drawing, code interpreter, web page link reading, TreeMind tree map, Baidu Academic Search, business information query, joining information query, and so on.
For example, when generating a rewritten version of "Cold Kiln Fugue", Wenshin Big Model X1 shows a clear chain of thinking: first, find allusions to historical figures similar to the original text, then pay attention to the style and syntax, then check the appropriateness of the allusions, and finally, keep the structure of the text smooth, so as to generate a text that is basically the same as the original text in terms of intention and style and syntax.
The enhanced capabilities of the Wenshin Grand Model X1 are due to several key technologies:
- Progressive and intensive learning and training methods: This innovative approach comprehensively improves the integrated application of models in scenarios such as authoring, searching, tool invocation, and reasoning.
- End-to-end training based on the chain of thought and action: For deep search, tool invocation and other scenarios, end-to-end model training is performed based on the result feedback, significantly improving the training effect.
- Diverse and unified reward system: Establish a unified reward system that incorporates multiple types of reward mechanisms to provide more robust feedback for model training.
Prices and Outlook
Currently, users can experience Wuxin Big Model 4.5 and Wuxin Big Model X1 for free on WuxinYiYin official website, and on Baidu Intelligent Cloud Qianfan Big Model platform, the input price of Wuxin Big Model 4.5 API is as low as 0.004 yuan/thousand words, and the output price is as low as 0.016 yuan/thousand words. Wenshin Big Model X1 will also be launched on Chifan platform soon, with input price as low as 0.002 Yuan/thousand words and output price as low as 0.008 Yuan/thousand words.
Baidu said that 2024 is the year of full iteration of the big model technology, and Baidu will make bolder investments in artificial intelligence, data centers, and cloud infrastructure to build better and smarter next-generation models.