AI Sharing Circle

Daily sharing of the latest AI products, projects, frameworks, paper interpretations, etc.~
CombatVLA - 淘天集团推出的高效VLA模型

CombatVLA - Efficient VLA Model by Amoy Group

CombatVLA is an innovative 3D action role-playing game (ARPG)-specific model from the Future Life Lab team of the Amoy Sky Group.CombatVLA is a visual-linguistic-action (VLA) model, built on a 3B parametric scale, that collects human player's through a motion tracker...
2mos ago
019.6K
DeepSeek V3.1 - DeepSeek推出的最新开源AI模型

DeepSeek V3.1 - Latest Open Source AI Models from DeepSeek

DeepSeek V3.1 is a new generation of AI models introduced by DeepSeek, with important upgrades based on its predecessor, V3. DeepSeek V3.1 introduces a hybrid reasoning architecture that allows the model to flexibly switch between thinking and non-thinking modes, significantly improving the thinking...
2mos ago
021.2K
Qwen-Image-Edit - 阿里通义开源的图像编辑模型

Qwen-Image-Edit - Ali Tongyi open source image editing model

Qwen-Image-Edit is an all-purpose image editing model introduced by Ali Tongyi, built on the Qwen-Image architecture with 20 billion parameters. The model combines both semantic and appearance editing capabilities, and can perform low-level visual appearance editing on images (e.g., adding, deleting...
2mos ago
021K
MoE-TTS - 昆仑万维推出的最新语音生成框架

MoE-TTS - The Latest Speech Generation Framework from KunlunWei

MoE-TTS is a speech synthesis framework introduced by KunlunWanwei, based on the Mixed Expert (MoE) architecture, which combines pre-trained Large Language Models (LLMs) with speech expert modules.MoE-TTS retains the powerful textual reasoning by freezing the textual module parameters and updating only the speech module parameters...
2mos ago
022.7K
Mureka V7.5 - 昆仑万维推出的先进AI音乐创作模型

Mureka V7.5 - Advanced AI Music Creation Model from Quintessence

Mureka V7.5 is a state-of-the-art AI music generation model from Kunlun World Wide, focusing on Chinese songwriting. The model can accurately reproduce tones and playing techniques to generate natural, smooth and emotional vocals. Based on optimized automatic speech recognition (ASR) technology, Mureka V...
2mos ago
022.1K
Skywork Deep Research Agent v2 - 昆仑万维推出的深度研究智能体升级版

Skywork Deep Research Agent v2 - An Upgraded Version of Deep Research Intelligence from Kunlun

Skywork Deep Research Agent v2 is a deep research intelligent body launched by Kunlun Wave, focusing on the integration and analysis of multimodal information.Skywork Deep Research Agent v2 can process text, graph...
2mos ago
019.5K
Hunyuan-GameCraft - 腾讯混元开源的下一代游戏交互式视频生成框架

Hunyuan-GameCraft - Tencent Hunyuan's open source framework for generating interactive video for next-generation games.

Hunyuan-GameCraft is Tencent Hunyuan team open source interactive game video generation framework. Framework from a single picture and prompts to generate highly dynamic game video , support the user through the keyboard and mouse to control the video content in real time .
2mos ago
024.8K
Skywork UniPic 2.0 - 昆仑万维开源的高效多模态模型

Skywork UniPic 2.0 - Open Source Efficient Multi-Modal Modeling by KunlunWanwei

Skywork UniPic 2.0 is an efficient multimodal model open-sourced by KunlunWei, focusing on image generation, editing and understanding. The model is based on a 2B-parameter SD3.5-Medium architecture, which is realized through pre-training, progressive dual-task reinforcement strategies and co-training...
2mos ago
022.6K
RynnRCP - 阿里达摩院推出的首个开源机器人上下文协议

RynnRCP - First Open Source Robotics Context Protocol from Ali Dharma Institute

RynnRCP is an open source Robot Context Protocol (RCP) from Ali Dharma Institute that lowers the threshold for development of embodied intelligence and opens up the entire development process.RynnRCP consists of the RCP framework and the RobotMotion module.The RCP framework, through capability abstraction and multi-protocol support, will...
2mos ago
023.1K
RynnEC - 阿里达摩院开源的世界理解模型

RynnEC - Ali Dharma Institute's open source world understanding model

RynnEC is a world understanding model introduced by Alibaba Dharma Institute, focusing on embodied intelligence tasks. The model is based on multimodal fusion technology, combining video data and natural language, and can parse objects in a scene from multiple dimensions, supporting functions such as object understanding, spatial perception and video target segmentation.
2mos ago
022.3K