Youtu-agent - Tencent open source efficient intelligent body framework
Youtu-agent is an open source framework for building and running autonomous intelligences from Tencent Youtu Labs. The framework performs well in WebWalkerQA and GAIA benchmarks, with an accuracy of 71.47% and 72.8% respectively.The framework...
HunyuanVideo-Foley - Tencent's Open Source Video Sound Generation Model
HunyuanVideo-Foley is an open source video sound generation model by the Tencent Mixed Yuan team that supports adding accurately matched sound effects to silent videos. The model is based on a large-scale dataset training , with a multimodal diffusion transformer architecture , combined with the characterization of the alignment loss function and audio VAE optimization techniques ...
PixVerse V5 - Self-developed AI video model launched by Aishi Technologies
PixVerse V5 is a big model of AI video generation launched by Aishi Technology. The model can generate high-quality video content based on user-input text descriptions or images, and supports multiple styles, such as anime, sci-fi, and national style.
Ask White 5 - All-in-One AI Model from Ask White
Ask White 5 is the flagship "All in One" model with a very high level of intelligence. The model has excellent performance in many assessments, such as the AA-Index composite assessment score of 64.7 and the STEM ability assessment score of 86, which is close to the world's leading GPT-5.
Gemini 2.5 Flash Image - The Most Powerful Image Generation and Editing Model from Google
Gemini 2.5 Flash Image (codename nano banana) is a state-of-the-art image generation and editing model from Google that maintains the consistency of characters across different scenes and supports precise image editing through natural language, such as blurring backgrounds and removing stains.
Wan2.2-S2V - Ali Tongyi open source audio-driven video generation model
Wan2.2-S2V is Ali Tongyi open source multimodal video generation model , only a static picture and a piece of audio , you can generate high-quality digital human video , and supports a variety of image types and frame .
Free Course on ChatGPT Tip Engineering for Developers by Ernest Ng
ChatGPT Tip Engineering for Developers is a joint DeepLearning.AI and OpenAI course designed for developers, featuring Isa Fulford, Andrew Ng to teach how to use Large Language Models (LLMs...
Ask Whitey o4 - A parallel thinking model introduced by Ask Whitey that opens 8 thinking paths at the same time
Ask White o4 is an innovative parallel thinking model that opens 8 thinking paths at the same time, analyzes the problem from multiple perspectives and automatically filters out the optimal solution. The model incorporates advanced Long-CoT reinforcement learning and process reward learning techniques, has powerful deep reasoning capabilities, and performs well in complex tasks.
VibeVoice - Text-to-Speech Model from Microsoft
VibeVoice is a new text-to-speech (TTS) model from Microsoft. The model generates conversational audio from up to four different speakers and supports up to 90 minutes of continuous voice output, breaking the length limitations of traditional TTS systems.
SpatialGen - Open Source 3D Scene Generation Model by Qunar Technology
SpatialGen is an open source 3D scene generation model of Qunar Technology, based on the diffusion model architecture, supporting the generation of spatio-temporally consistent multi-view images based on textual descriptions, reference images and 3D spatial layouts, and further generating 3D Gaussian scenes and rendering roaming videos.