AI Sharing Circle

Daily sharing of the latest AI products, projects, frameworks, paper interpretations, etc.~
gpt-realtime - OpenAI最新推出的AI语音模型

gpt-realtime - OpenAI's newest AI speech model

gpt-realtime is an advanced speech model from OpenAI that supports direct audio processing to generate natural and smooth speech. The model supports multiple languages and styles, understands non-verbal cues such as laughter, and can switch between languages.
3mos ago
028.1K
Youtu-agent - 腾讯开源的高效智能体框架

Youtu-agent - Tencent open source efficient intelligent body framework

Youtu-agent is an open source framework for building and running autonomous intelligences from Tencent Youtu Labs. The framework performs well in WebWalkerQA and GAIA benchmarks, with an accuracy of 71.47% and 72.8% respectively.The framework...
3mos ago
032.1K
HunyuanVideo-Foley - 腾讯推出的开源视频音效生成模型

HunyuanVideo-Foley - Tencent's Open Source Video Sound Generation Model

HunyuanVideo-Foley is an open source video sound generation model by the Tencent Mixed Yuan team that supports adding accurately matched sound effects to silent videos. The model is based on a large-scale dataset training , with a multimodal diffusion transformer architecture , combined with the characterization of the alignment loss function and audio VAE optimization techniques ...
3mos ago
033.5K
PixVerse V5 - 爱诗科技推出的自研AI视频模型

PixVerse V5 - Self-developed AI video model launched by Aishi Technologies

PixVerse V5 is a big model of AI video generation launched by Aishi Technology. The model can generate high-quality video content based on user-input text descriptions or images, and supports multiple styles, such as anime, sci-fi, and national style.
3mos ago
030.9K
问小白5 - 问小白推出的全能AI模型

Ask White 5 - All-in-One AI Model from Ask White

Ask White 5 is the flagship "All in One" model with a very high level of intelligence. The model has excellent performance in many assessments, such as the AA-Index composite assessment score of 64.7 and the STEM ability assessment score of 86, which is close to the world's leading GPT-5.
3mos ago
028.5K
Gemini 2.5 Flash Image - 谷歌推出的最强图像生成与编辑模型

Gemini 2.5 Flash Image - The Most Powerful Image Generation and Editing Model from Google

Gemini 2.5 Flash Image (codename nano banana) is a state-of-the-art image generation and editing model from Google that maintains the consistency of characters across different scenes and supports precise image editing through natural language, such as blurring backgrounds and removing stains.
3mos ago
031.2K
Wan2.2-S2V - 阿里通义开源的音频驱动视频生成模型

Wan2.2-S2V - Ali Tongyi open source audio-driven video generation model

Wan2.2-S2V is Ali Tongyi open source multimodal video generation model , only a static picture and a piece of audio , you can generate high-quality digital human video , and supports a variety of image types and frame .
3mos ago
029.8K
吴恩达面向开发者的ChatGPT提示工程免费课程

Free Course on ChatGPT Tip Engineering for Developers by Ernest Ng

ChatGPT Tip Engineering for Developers is a joint DeepLearning.AI and OpenAI course designed for developers, featuring Isa Fulford, Andrew Ng to teach how to use Large Language Models (LLMs...
3mos ago
031K
问小白o4 - 问小白推出的并行思考模型,同时开启8条思考路径

Ask Whitey o4 - A parallel thinking model introduced by Ask Whitey that opens 8 thinking paths at the same time

Ask White o4 is an innovative parallel thinking model that opens 8 thinking paths at the same time, analyzes the problem from multiple perspectives and automatically filters out the optimal solution. The model incorporates advanced Long-CoT reinforcement learning and process reward learning techniques, has powerful deep reasoning capabilities, and performs well in complex tasks.
3mos ago
026.3K
VibeVoice - 微软推出的文本到语音模型

VibeVoice - Text-to-Speech Model from Microsoft

VibeVoice is a new text-to-speech (TTS) model from Microsoft. The model generates conversational audio from up to four different speakers and supports up to 90 minutes of continuous voice output, breaking the length limitations of traditional TTS systems.
3mos ago
036.3K