Latest AI Resources

Total 3045 articles posts

Course materials Latest AI Resources AI Knowledge Base AI News

Sorting

NewBie-image-Exp0.1 - NewBieAI-Lab开源的实验性动漫文生图模型

NewBie-image-Exp0.1 - NewBieAI-Lab open source experimental anime literate graphical models

NewBie-image-Exp0.1 is the first experimental anime text-born graph model open-sourced by the NewBieAI-Lab team, using the Next-DiT architecture with 3.5B parameters, optimized for the secondary style. The model is optimized for the secondary style by a dual text encoder (GEMMA3-4B...

Latest AI Resources

4mos ago

029.2K

LongCat-Image - 美团LongCat团队开源的图像生成与编辑模型

LongCat-Image - LongCat team open source image generation and editing model of the Mission

LongCat-Image is an open source image generation and editing model released by the LongCat team of Meituan. Using a hybrid backbone architecture (MM-DiT+Single-DiT), combined with a visual language model (VLM) conditional encoder, it is able to realize text-generated images and multiple rounds of image editing...

Latest AI Resources

4mos ago

024.7K

VibeVoice-Realtime - 微软开源的轻量级实时文本转语音模型

VibeVoice-Realtime - Microsoft open source lightweight real-time text-to-speech model

VibeVoice-Realtime is Microsoft's open source lightweight real-time text-to-speech (TTS) model designed for low latency and real-time interaction. Supports streaming text input , from the first text token can be vocalized , the delay is only about 300 milliseconds , suitable for dynamic number ...

Latest AI Resources

4mos ago

025.4K

Flowra - 魔搭联合呜哩WULI团队开源的AI工作流开发工具

Flowra - AI workflow development tool open-sourced by Magic Hitch and Wooli WULI team

Flowra is ModelScope joint woo mile WULI team open source graph execution engine and node package development tools, is the core component of FlowBench. Through the directed acyclic graph (DAG) organization workflow , with intelligent caching , parallel scheduling , distributed support ...

Latest AI Resources

4mos ago

024.8K

RoboCOIN - 智源联合多所高校开源的双臂机器人真机数据集

RoboCOIN - A real robot dataset for dual-armed robots open-sourced by Wisdom Source in collaboration with several universities

RoboCOIN is the world's first large-scale dual-arm robot real machine dataset open-sourced by Beijing Zhiyuan Artificial Intelligence Research Institute in conjunction with a number of enterprises and colleges and universities, which contains 15 types of robot platforms, 180,000 real operation trajectories, and 421 types of task scenarios. The most important feature is the use of hierarchical annotation system to disassemble the task ...

Latest AI Resources

4mos ago

025.3K

TalkCody - 免费开源的AI编程桌面助手，支持复杂任务

TalkCody - Free and open source AI programming desktop assistant with support for complex tasks

TalkCody is a free and open source AI programming assistant desktop application , built on Rust + Tauri 2 , support for Windows, macOS and Linux three platforms , with native performance , fast startup and low resource consumption advantages . Supports more than 50 mainstream A...

Latest AI Resources

4mos ago

029.1K

MemMachine - MemVerge推出的开源AI记忆系统

MemMachine - Open Source AI Memory System by MemVerge

MemMachine is an open source AI memory system developed by MemVerge, designed for AI models and intelligences, which can store and recall interaction data like the human brain, solving the problem of AI "stateless memory loss". It adopts a layered architecture (short-term memory, long-term memory, user image...

Latest AI Resources

4mos ago

029.1K

PartCrafter - 北大联合字节开源的单图3D生成模型

PartCrafter - NU United Bytes open source single figure 3D generated models

PartCrafter is an advanced 3D generative model, jointly proposed by Peking University, ByteDance and Carnegie Mellon University. It can generate multiple semantically explicit and geometrically diverse 3D mesh parts from a single RGB image at once. The models are modeled through a combinatorial potential space and...

Latest AI Resources

4mos ago

026.5K

GigaWorld-0 - 极佳视界开源的世界模型框架

GigaWorld-0 - GigaVision open source world modeling framework

GigaWorld-0 is the open source world modeling framework of domestic Embodied Intelligence startup GigaAI, mainly used to solve the data bottleneck problem in the field of Embodied Intelligence (Embodied AI). Efficiently generating high-quality, diverse and physically realistic training data, push...

Latest AI Resources

4mos ago

025.4K

Mistral 3 - Mistral AI发布开源的最新多模态大模型系列

Mistral 3 - Mistral AI Releases Open Source's Newest Series of Multimodal Large Models

Mistral 3 is the latest multimodal large model series released as open source by Mistral AI, including the flagship model Mistral Large 3 (675B total parameters) and the lighter version of the Ministral series (3B/8B/14B), both supporting image understanding...

Latest AI Resources

4mos ago

023.5K

Vidi2 - 字节跳动开源的多模态视频理解与生成大模型

Vidi2 - ByteHop's open source multimodal video understanding and generation of large models

Vidi2 is a second-generation multimodal video understanding and generation big model open-sourced by ByteDance, focusing on video content understanding, analysis and creation. It supports joint input of text, video, and audio modalities, and can simultaneously understand picture content, sound information, and natural language commands to achieve cross-modal interaction and push...

Latest AI Resources

4mos ago

027.2K

Alpamayo-R1 - 英伟达开源的带推理能力的视觉-语言-行动模型

Alpamayo-R1 - NVIDIA's Open Source Vision-Language-Action Model with Reasoning Capabilities

Alpamayo-R1 is a NVIDIA-developed Vision-Language-Action (VLA) model with reasoning capability, designed to enhance the decision-making capability of autonomous driving in complex scenarios. By introducing a causal chain reasoning mechanism, the vehicle is able to analyze scene causality (e.g., "cause before...

Latest AI Resources

4mos ago

035.5K

Ovis-Image - 阿里AIDC-AI团队开源的文生图模型

Ovis-Image - Ali AIDC-AI team's open source Vincentian graph model

Ovis-Image is a 7 billion parameter text-generated graph model open-sourced by the AIDC-AI team of Alibaba International Digital Commerce Group, focusing on high-quality text rendering. Based on Ovis-U1 architecture, it inherits the advanced visual decoder and bi-directional Token refiner ...

Latest AI Resources

4mos ago

023.4K

悟界·Emu3.5 - 智源研究院开源的多模态世界大模型

Wujie-Emu3.5 - Wisdom Source Research Institute open source multimodal world big model

Wujie-Emu3.5 is an open source multimodal world grand model from Beijing Zhiyuan Artificial Intelligence Research Institute, with 34 billion references and native world modeling capability. Trained by 10 trillion multimodal Token (including 790 years of video data), it can simulate the laws of physics and realize graphic generation, visual guidance...

Latest AI Resources

4mos ago

026.6K

GELab-Zero - 阶跃团队开源的端侧多模态GUI Agent模型

GELab-Zero - Open source end-side multimodal GUI Agent model by Steps team

GELab-Zero is an open source end-side multimodal GUI Agent model by Step Leap Team , built on Qwen3-VL-4B-Instruct base model with 4B parameters.It can recognize UI elements and perform operations such as clicking and sliding, and supports cross-application tasking ...

Latest AI Resources

4mos ago

034.2K

Depth Anything 3 - 字节跳动Seed开源的3D视觉重建模型

Depth Anything 3 - 3D Visual Reconstruction Models for ByteHop Seed Open Source

Depth Anything 3 (DA3) is a 3D visual reconstruction model developed and open-sourced by the Byte Jump Seed team. Through a single Transformer architecture to realize the spatial geometry of any viewpoint reconstruction, only need to predict the depth map and ray map can restore the three-dimensional scene, compared to...

Latest AI Resources

4mos ago

035.9K

DeepSeek-Math-V2 - DeepSeek开源的数学推理模型

DeepSeek-Math-V2 - DeepSeek open source mathematical reasoning model

DeepSeek-Math-V2 is an open source mathematical reasoning model by DeepSeek, an AI company under Phantom Cube, and the latest version is based on DeepSeek-V3.2-Exp-Base improvement, with performance surpassing that of Gemini DeepThink to reach the international number...

Latest AI Resources

4mos ago

028.6K

Z-Image - 阿里通义实验室开源的图像生成模型

Z-Image - Ali Tongyi Labs open source image generation model

Z-Image is an open source image generation model from Ali Tongyi Labs with efficient, fast and powerful image generation capabilities. Using a single-stream diffusion Transformer architecture (S3-DiT), it integrates text, visual semantics and image VAE tokens into a unified input stream...

Latest AI Resources

4mos ago

050.3K

ROCK - 阿里巴巴开源的智能体训练环境沙箱

ROCK - Alibaba open source smart body training environment sandbox

ROCK (Reinforcement Open Construction Kit) is Alibaba's open source sandbox for training environment of intelligences, which solves the problem that intelligences can't be scaled up for training in real environments.ROCK provides a highly stable sandbox management service...

Latest AI Resources

4mos ago

027.1K

ViMax - 香港大学开源的多智能体视频生成框架

ViMax - Open Source Multi-intelligent Body Video Generation Framework at the University of Hong Kong

ViMax is an open source multi-intelligence body video generation framework from the Data Science Laboratory of the University of Hong Kong, which can automate the whole process from creative input to video output. Integration of script generation , scene design , shot planning and video rendering and other functions , to support users to generate coherent film and television grade video through natural language description ...

Latest AI Resources

4mos ago

044.6K

FLUX.2 - 黑森林开源的图像生成与编辑模型

FLUX.2 - Black Forest Open Source Image Generation and Editing Model

FLUX.2 is an open source image generation and editing model released by Black Forest Labs that supports textual raw images, multi-image referencing, and image editing with richer details, clear textures, and stable lighting. There are four versions: FLUX.2 [pro] (comparable to the top closed source...

Latest AI Resources

4mos ago

026.3K

Fara-7B - 微软开源的计算机操作Agent助手模型

Fara-7B - Microsoft's open-source computer-operated Agent assistant model

Fara-7B is a Microsoft open source release of a 7-billion-parameter-scale computer-operated agent (CUA) model based on the Qwen 2.5-VL-7B architecture. By visually parsing web page screenshots and performing clicks, inputs, and other actions on the screen, without relying on additional accessibility trees or multiple large models...

Latest AI Resources

4mos ago

032K

HunyuanOCR - 腾讯混元开源的光学字符识别专家模型

HunyuanOCR - Tencent's open source expert model for optical character recognition

HunyuanOCR is a high-performance optical character recognition model open-sourced by the Tencent hybrid team, with a reference number of only 1 billion. Developed based on the hybrid multimodal architecture, it adopts an end-to-end design and can efficiently handle text detection, recognition and document parsing tasks. The model scored 94.1 points in the complex document test, surpassing...

Latest AI Resources

4mos ago

033.6K

Supertonic - 开源的高性能AI 文本转语音系统，极速离线运行

Supertonic - Open source, high performance AI text-to-speech system that runs offline very fast!

Supertonic is open source, high-performance text-to-speech (TTS) system focused on rapid speech generation on local devices. Using ONNX Runtime technology, it can run on devices such as cell phones, computers and even Raspberry Pi, supports 23 languages and speech clones, and requires no network...

Latest AI Resources

4mos ago

027.8K

MiMo-Embodied - 小米开源的跨领域具身智能基座模型

MiMo-Embodied - Xiaomi's Open Source Cross-Domain Embodied Intelligence Pedestal Model

MiMo-Embodied is the world's first cross-embodied base model that successfully integrates Embodied AI and autonomous driving open-sourced by Xiaomi Group. It solves the knowledge migration problem between Embodied AI and autonomous driving, and realizes the unified modeling of tasks in the two fields.

Latest AI Resources

5mos ago

033.2K

MOSS-Speech - 复旦大学开源的语音到语音大模型

MOSS-Speech - Fudan University's open source speech-to-speech grand modeling

MOSS-Speech is an open source speech-to-speech (Speech-to-Speech) big model by Prof. Qiu Xipeng's team at Fudan University. It breaks through the traditional speech processing, without the need for text guidance, and directly understands and generates speech, which can capture non-text elements such as intonation and emotion, making...

Latest AI Resources

5mos ago

028.6K

Parallax - Gradient开源的全球首个全自主AI操作系统

Parallax - The world's first fully autonomous AI operating system open-sourced by Gradient

Parallax is the world's first "fully autonomous AI operating system" open-sourced by Gradient, a distributed AI lab. It supports cross-platform deployment of large models on Mac, Windows and other heterogeneous devices, allowing users to fully control the model, data and AI memory. The system is built-in network-aware ...

Latest AI Resources

5mos ago

084.2K

HunyuanVideo 1.5 - 腾讯混元免费开源的轻量级视频生成模型

HunyuanVideo 1.5 - Tencent mixed yuan free open source lightweight video generation model

HunyuanVideo 1.5 is a Tencent hybrid big model team open source lightweight video generation model , based on the Diffusion Transformer (DiT) architecture , the number of parameters is 8.3B. support for generating 5-10 seconds of high-definition video , sub...

Latest AI Resources

5mos ago

034.4K

Awex - 蚂蚁集团开源的高性能权重交换框架

Awex - Ant Group open source high performance weight exchange framework

Awex is the Ant Group open source high performance weight exchange framework, designed for large-scale parameter synchronization in reinforcement learning. It can complete terabytes of parameter exchange in seconds, significantly improving the efficiency of training and inference.Awex has a very fast synchronization performance, in a thousand card cluster, trillion parameter models can be completed within 6 seconds of the full amount of...

Latest AI Resources

5mos ago

081.9K

Seekdb - 蚂蚁OceanBase开源的AI原生混合搜索数据库

Seekdb - Ant OceanBase open source AI-native hybrid search database

Seekdb (OceanBase Seekdb) is Ant OceanBase open source AI native hybrid search database, supporting unified hybrid search of vector, full text, scalar and geospatial data, using multi-stage retrieval mechanism, to achieve high precision search under low latency ...

Latest AI Resources

5mos ago

027.5K

LoopTool - 上海交大联合小红书开源的自动化工具调用数据进化框架

LoopTool - Shanghai Jiaotong University and Little Red Book open source automated tool to call the data evolution framework

LoopTool is an automated tool-call data evolution framework open-sourced by Shanghai Jiao Tong University and Little Red Book team, designed to improve the tool-call capability of large language models. It optimizes data generation and model training through closed-loop iteration, using open-source models (e.g., Qwen3-32B) as data generation...

Latest AI Resources

5mos ago

083.3K

SAM 3D - Meta开源的3D重建模型系列

SAM 3D - Meta open source 3D reconstruction model series

SAM 3D is a 3D reconstruction model based on the SAM series from Meta, including two branches, SAM 3D Objects and SAM 3D Body. SAM 3D Objects can generate interactive 3D object models from a single photo, supporting...

Latest AI Resources

5mos ago

031.2K

AgentEvolver - 阿里通义实验室开源的智能体进化系统

AgentEvolver - Ali Tongyi Labs Open Source System for Intelligent Body Evolution

AgentEvolver is an open source intelligent body evolution system from Alibaba Tongyi Lab. Through the three mechanisms of self-questioning, self-navigation and self-attribution, it realizes the autonomous learning and evolution of intelligences.AgentEvolver adopts a service-oriented architecture that combines environmental sandboxes, LLMs, and sc...

Latest AI Resources

5mos ago

089.3K

MemOS - 开源的AI记忆管理与调度平台，共享长期记忆

MemOS - Open Source AI Memory Management and Scheduling Platform for Sharing Long-Term Memories

MemOS is an open source memory management and scheduling framework for large language models (LLMs) such as MemTensor. Considering memory as a resource as important as arithmetic power, it unifies the management of plaintext, activation state and parameter memory through standardized MemCube memory units.

Latest AI Resources

5mos ago

085.5K

WithAnyone - 复旦联合阶跃星辰开源的AI合照生成模型

WithAnyone - Fudan joint step leap star open source AI photo generation model

WithAnyone is an AI photo generation model jointly developed by Fudan University and StepStar, which solves the common "copy and paste" problem in traditional AI image generation and realizes more natural and controllable multi-person image generation. The model is based on the large-scale dataset MultiID-2M ...

Latest AI Resources

5mos ago

084.3K

ChatTutor - 开源的AI教学辅助工具，可视化互动学习

ChatTutor - Open source AI teaching aid to visualize interactive learning

ChatTutor is an open source AI teaching aid focused on visual and interactive learning of STEM subjects. Through the multi-intelligent body architecture to achieve dialogical Q&A and dynamic drawing function, can draw math graphs, physics circuits or mind maps on the whiteboard in real time, to help users intuitively understand the abstract generalization ...

Latest AI Resources

5mos ago

023.5K

DPAI Arena - JetBrains开源的AI编程基准测试平台

DPAI Arena - JetBrains Open Source Benchmarking Platform for AI Programming

DPAI Arena (Developer Productivity AI Arena) is an open benchmarking platform created by JetBrains to measure the effectiveness of AI-assisted development tools in real-world software engineering tasks. Through a transparent evaluation stream...

Latest AI Resources

5mos ago

029.3K

EverMemOS - 盛大团队推出的开源长期记忆操作系统

EverMemOS - Open Source Long-Term Memory Operating System by Team Shanda

EverMemOS is an open source long-term memory operating system launched by the Shanda team led by Chen Tianqiao, designed for AI intelligences to solve the problem of memory breakage caused by the fixed context window of large language models. The system is based on the human brain memory mechanism, using a four-layer architecture (agent layer, memory layer, index layer...

Latest AI Resources

5mos ago

035.4K

Astron Agent - 科大讯飞开源的企业级智能工作流开发平台

Astron Agent - KDDI open source enterprise-class intelligent workflow development platform

Astron Agent is an open source enterprise-grade intelligent workflow development platform of KDDI , focusing on helping enterprises quickly build a landable AI agent application . Using Java + Spring Boot technology stack , support for lightweight private deployment (minimum 2-core 4G configuration) , built-in ...

Latest AI Resources

5mos ago

029.6K

Bee - 腾讯混元联合清华开源的全栈多模态大模型项目

Bee - Tencent hybrid open source full-stack multimodal large model project with Tsinghua University

Bee is a full-stack open-source multimodal big model solution jointly launched by Tencent Mixed Element team and Tsinghua University to narrow the performance gap between open-source and closed-source models by improving data quality. The project contains three core achievements: 15 million scale high-quality two-layer CoT dataset Honey-Data...

Latest AI Resources

5mos ago

026.5K

InfinityStar - 字节开源的统一时空自回归视频生成框架

InfinityStar - Byte Open Source Unified Spatio-Temporal Autoregressive Video Generation Framework

InfinityStar is a unified spatio-temporal autoregressive framework open-sourced by ByteDance, designed for high-resolution image and video generation. Using a discrete autoregressive approach, it can simultaneously handle text-to-image, text-to-video, and image-to-video tasks in a single model. The framework is benchmarked in VBench ...

Latest AI Resources

5mos ago

027.6K

Koina - 慕尼黑工大联合密歇根大开源的去中心化机器学习平台

Koina - Decentralized Machine Learning Platform Open-Sourced by TU Munich and U of Michigan

Koina is an open source decentralized machine learning platform focused on simplifying proteomics data analysis. Developed by a team from the Technical University of Munich, Germany and the University of Michigan, USA. The platform integrates more than 30 mainstream models (e.g., ProSIT, MS²PIP) through a standardized interface and supports peptide mass...

Latest AI Resources

5mos ago

028.2K

VibeThinker-1.5B - 微博AI开源的15亿参数大型语言模型

VibeThinker-1.5B - 1.5 billion parameter large-scale language model open-sourced by Weibo AI

VibeThinker-1.5B is a 1.5 billion parameter large-scale language model open-sourced by Weibo AI. Fine-tuned based on Alibaba's Qwen2.5-Math-1.5B, it is optimized for mathematical and coding tasks and performs well, with industry-leading inference performance.

Latest AI Resources

5mos ago

031.3K

BestBlogs - 开源的AI内容聚合平台，精选优质技术内容

BestBlogs - Open source AI content aggregation platform with a selection of quality technical content

BestBlogs is a platform focused on providing high-quality content for technology practitioners, entrepreneurs, product managers, and more. Through RSS feeds and crawler technology, it collects articles, podcasts, videos, and other multi-format content from more than 400 high-quality blogs. The core strength lies in utilizing AI big language...

Latest AI Resources

5mos ago

025.1K

Egocentric-10K - Build AI开源的第一人称视角机器人数据集

Egocentric-10K - Build AI's open source first-person perspective robotics dataset

Egocentric-10K is a large-scale first-person view (egocentric) factory operations video dataset open-sourced by the build.ai team. The dataset contains 10,000 hours of video, totaling 1.08 billion frames, involving 2...

Latest AI Resources

5mos ago

030.2K

LazyCraft - 开源AI Agent应用开发与管理平台，基于LazyLLM构建

LazyCraft - Open Source AI Agent Application Development and Management Platform, built on LazyLLM

LazyCraft is an open source AI Agent application development and management platform built by Shangtang based on the open source framework LazyLLM, which provides one-stop AI application development solutions for enterprises and developers. It helps developers to quickly build and release large model applications with low threshold and low cost...

Latest AI Resources

5mos ago

034K

Kosong - Moonshot AI开源的全新AI Agent开发框架

Kosong - Moonshot AI's New Open Source AI Agent Development Framework

Kosong is a new AI Agent development framework open-sourced by Dark Side of the Moon (Moonshot AI) that provides developers with a lightweight, flexible, and highly scalable underlying support for building next-generation intelligent body applications. With an asynchronous tool orchestration engine that efficiently schedules multiple tools...

Latest AI Resources

5mos ago

028.3K

SenseNova-SI - 商汤科技开源的空间智能大模型系列

SenseNova-SI - A Family of Open Source Spatial Intelligence Large Models from ShangTech

SenseNova-SI is an open source spatial intelligence grand model released by ShangTech, focusing on improving AI's ability in spatial understanding and reasoning. The model excels in six core dimensions, including spatial measurement, reconstruction, relationship judgment, perspective transformation, deformation analysis, and spatial reasoning, significantly outperforming other...

Latest AI Resources

5mos ago

024.7K

Omnilingual ASR - Meta推出的多语言语音识别框架

Omnilingual ASR - Multilingual Speech Recognition Framework from Meta

Omnilingual ASR is a multilingual speech recognition framework introduced by Meta, covering 1600+ languages, with 78% language character error rate lower than 10%. its 7 billion parameter wav2vec 2.0 encoder combined with CTC and Transformer decoder, support...

Latest AI Resources

5mos ago

028.5K

Frappe Builder - 开源的AI低代码网站构建工具，拖拽组件快速搭建

Frappe Builder - Open source AI low-code website builder, drag-and-drop components for fast building

Frappe Builder is open source low-code website builder, developed by Frappe, the core feature is to provide a Figma-like visual editor that supports drag-and-drop components to build websites quickly. Part of the Frappe ecology (Frappeverse)...

Latest AI Resources

5mos ago

031.3K

DeepOCR - 基于DeepSeek-OCR模型的开源复刻项目

DeepOCR - Open source replica project based on the DeepSeek-OCR model

DeepOCR is an open source replication project that implements the core architecture of DeepSeek-OCR, which efficiently processes textual information through optical compression techniques. The core is DeepEncoder, consisting of SAM-base (processing high-resolution images), 16× convolutional compressor...

Latest AI Resources

5mos ago

027.9K

Glow - 开源的命令行工具，支持在终端渲染Markdown文件

Glow - open source command line tool that supports rendering Markdown files in the terminal

Glow is open source command line tool for elegantly rendering Markdown files in the terminal. The tool supports highlighting code blocks , mathematical formulas and other complex elements , providing a wealth of features such as custom styles , paging display , mouse support and so on.

Latest AI Resources

5mos ago

032K

NocoBase - 免费开源的AI无代码开发平台，可视化构建应用

NocoBase - Free and open source AI no-code development platform to build apps visually

NocoBase is based on AI-driven open-source no-code development platform that supports the rapid construction of business systems, without programming to complete the application development through configuration. The project uses Apache-2.0 protocol , provides private deployment and flexible scalability , suitable for enterprise management , collaboration platforms and other fields ...

Latest AI Resources

5mos ago

028.4K

UniWorld V2 - 兔展智能联合北大推出的新一代图像编辑模型

UniWorld V2 - A New Generation of Image Editing Models Launched by Rabbit Show Intelligence in Association with Peking University

UniWorld V2 is a new generation of image editing model jointly launched by RabbitZhan Intelligence and UniWorld team of Peking University. It has significant advantages in the field of image editing, especially in Chinese comprehension and execution of complex commands. The model can accurately render artistic Chinese fonts and support fine...

Latest AI Resources

5mos ago

030.1K

SmartResume - 阿里巴巴开源的AI简历解析与优化工具

SmartResume - Alibaba open source AI resume parsing and optimization tool

SmartResume is Alibaba open source intelligent resume parsing and optimization tool , can efficiently extract structured information from PDF, images or Office documents , such as basic information , education and work experience . By integrating OCR technology and PDF metadata...

Latest AI Resources

5mos ago

031.6K

Step-Audio-EditX - 阶跃星辰开源的首个LLM级音频编辑大模型

Step-Audio-EditX - Step-Star's first open source LLM-level audio editing large model

Step-Audio-EditX is an open source audio editing grand model, developed by the Step-Star team, focusing on fine-grained manipulation of audio content through artificial intelligence technology. The model can dynamically adjust the mood of the audio, speaking style (such as petulant, old man accent, etc.) and paralinguistic elements (such as laughter, sigh...

Latest AI Resources

5mos ago

030.9K

Open-o3 Video - 北大联合字节开源的视频推理模型

Open-o3 Video - A Video Reasoning Model Open-Sourced by Peking University United Bytes

Open-o3 Video is an open source video inference model jointly developed by Peking University and ByteDance, focusing on enhancing video inference through temporal and spatial evidence. By explicitly labeling key evidence with timestamps and bounding boxes, it helps the model better understand and interpret video content.

Latest AI Resources

5mos ago

027.3K

Handy - 开源免费的本地AI语音转文字工具

Handy - Open Source Free Native AI Speech to Text Tool

Handy is open source and free local speech to text tool, supporting Windows, MacOS and Linux systems, developed by Rust and React. It is suitable for quick transcription and text input by processing voice data locally without uploading it to the cloud to ensure privacy and security.

Latest AI Resources

5mos ago

059.8K

FG-CLIP 2 - 360开源的图文跨模态视觉语言模型

FG-CLIP 2 - 360 Open Source Cross-Modal Visual Language Model for Graphic Texts

FG-CLIP 2 is the world's leading graphical cross-modal visual language model (VL-M) launched by 360 Artificial Intelligence Research Institute, which surpasses similar models from Google and Meta in 29 authoritative benchmark tests, making it the most powerful VL-M at present.It is able to accurately recognize the gross...

Latest AI Resources

5mos ago

028.1K

微舆BettaFish - 开源的多智能体舆情分析系统

BettaFish - Open Source Multi-Intelligence Public Opinion Analyzing System

BettaFish is an open source multi-intelligence system for public opinion analysis. Using multi-intelligent body architecture, through Query, Media, Insight, Report and other Agents work together to achieve retrieval, extraction and reporting closed loop. The system supports AI-driven full ...

Latest AI Resources

5mos ago

061.5K

Ouro - 字节跳动Seed团队开源的新型循环语言模型

Ouro - A new cyclic language model open-sourced by the ByteHopper Seed team

Ouro is a new type of Looped Language Models (LLMs) developed by the ByteDance Seed team, with the core innovation of directly building inference capabilities in the pre-training phase through a parameter-sharing recurrent computation structure. The model uses 24 layers as the base block through...

Latest AI Resources

5mos ago

037.3K

ChronoEdit - 英伟达与多伦多大学联合开源的AI图像编辑框架

ChronoEdit - AI image editing framework jointly open-sourced by NVIDIA and the University of Toronto

ChronoEdit, an open-source AI image editing framework developed by NVIDIA in conjunction with the University of Toronto, redefines the image editing task as a video generation task to ensure that the editing results are temporally and physically consistent. By distilling a pre-trained video generation model with 14B parameters from a...

Latest AI Resources

5mos ago

032.1K

LongCat-Flash-Omni - 美团开源的全模态大语言模型

LongCat-Flash-Omni - A Fully Modal Large Language Model for Meituan Open Source

LongCat-Flash-Omni is an open source fully modal big language model released by the LongCat team of Meituan. With a parameter scale of 560 billion (27 billion activated parameters), it realizes millisecond-level real-time audio and video interaction capabilities while maintaining a large number of parameters.

Latest AI Resources

5mos ago

030.3K

Petri - Anthropic开源的 AI 安全审计框架

Petri - Anthropic's open source AI security auditing framework

Petri is an open source AI security auditing framework developed by Anthropic that systematically assesses the security and behavioral alignment of AI models. By simulating a real-world scenario where an automated auditor engages in multiple rounds of conversations with a target model, followed by a judge agent that acts on the model's...

Latest AI Resources

5mos ago

026.6K

Kimi Linear - 月之暗面开源的新型混合线性注意力架构

Kimi Linear - A New Hybrid Linear Attention Architecture Open-Sourced by Dark Side of the Moon

Kimi Linear is a new hybrid linear attention architecture open-sourced by Dark Side of the Moon, with Kimi Delta Attention (KDA) as the core, optimizing the traditional attention model through a finer-grained gating mechanism, which significantly improves the hardware efficiency and memory control ability ...

Latest AI Resources

5mos ago

038.6K

FIBO - 全球首个开源原生支持JSON的文本生成图像模型

FIBO - The world's first open-source native JSON-enabled text to image modeling

FIBO is the world's first open source text generation image model with native JSON support developed by Bria AI. Based on the DiT (Diffusion Transformer) architecture with 8B parameters, it adopts the Flow Matching training method...

Latest AI Resources

5mos ago

031K

SoulX-Podcast - Soul AI Lab开源的对话式语音合成模型

SoulX-Podcast - Soul AI Lab's Open Source Conversational Speech Synthesis Model

SoulX-Podcast is Soul AI Lab's open source advanced multi-speaker conversational speech synthesis model designed for generating high quality podcast content. SoulX-Podcast has the ability to generate multiple rounds of conversations, which can simulate smooth conversations in real podcasting scenarios, and supports Mandarin, English, and multiple Chinese...

Latest AI Resources

5mos ago

039.8K

GigaBrain-0 - 开源的具身基础模型，由世界模型生成数据驱动

GigaBrain-0 - Open source embodied base model driven by world model generation data

GigaBrain-0 is the first end-to-end Vision-Language-Action (VLA) embodied base model in China that uses world model generation data to realize real machine generalization, and it is jointly released as open source by GigaVision and Hubei Humanoid Robot Innovation Center. It adopts the hybrid Transformer architecture, integrating ...

Latest AI Resources

5mos ago

027.4K

Ming-flash-omni-Preview - 蚂蚁集团开源的全模态大模型

Ming-flash-omni-Preview - Ant Group's open source fully modal large models

Ming-flash-omni-Preview is an open-source full-modal macromodel released by Ant Group inclusionAI, with a parameter scale of hundreds of billions, based on the sparse MoE architecture of Ling 2.0, with total parameters of 103B and activations of 9B. in full-modal understanding and generating...

Latest AI Resources

5mos ago

032.1K

OmniVinci - NVIDIA开源的全模态大语言模型

OmniVinci - NVIDIA's Open Source Omnimodal Large Language Model

OmniVinci is an open-source, fully modal large-scale language model developed by NVIDIA that solves the problem of modal fragmentation in multimodal models through architectural innovation and data optimization. Alignment of visual and audio embeddings is enhanced by OmniAlignNet, which utilizes temporally embedded group capture...

Latest AI Resources

5mos ago

031.7K

olmOCR 2 - AI2开源的多模态文档解析模型

olmOCR 2 - AI2 open source multimodal document parsing model

olmOCR 2 is an open source multimodal document parsing model from the Allen Institute for Artificial Intelligence (AI2) and is an upgraded version of olmOCR. The digitized printed documents (e.g. PDF) will be high...

Latest AI Resources

5mos ago

037.8K

ValueCell - 开源的多智能体金融平台，多个Agent分工协作

ValueCell - Open Source Multi-Intelligence Financial Platform with Multiple Agents to Divide the Work

ValueCell is an open source multi-intelligent body financial application platform that improves the efficiency of financial analysis and investment management through AI technology. Simulating a professional investment team, multiple AI intelligences work together, covering market analysis, sentiment analysis, fundamental research, automated trading and other functions, to provide users with a comprehensive...

Latest AI Resources

5mos ago

057K

Dexbotic - 原力灵机开源的具身智能VLA模型一站式科研服务平台

Dexbotic - The Force Spirit machine open source body intelligence VLA model one-stop research service platform

Dexbotic is the open source Visual-Linguistic-Action (VLA) model of embodied intelligence one-stop scientific research service platform of Dexmal, which solves the problems of fragmentation and low efficiency of research in the field of embodied intelligence. Based on PyTorch, Dexbotic is a one-stop research service platform to solve the problems of fragmentation and inefficiency in the field of embodied intelligence...

Latest AI Resources

5mos ago

029K

LongCat-Video - 美团LongCat开源的视频生成模型

LongCat-Video - LongCat open source video generation model of the Mission

LongCat-Video is a 1.36 billion parameter video generation model open source by the LongCat team, using the MIT open source protocol, supporting three major tasks: text-generated video, graph-generated video and video continuation. The model through the "coarse to fine" generation strategy and block sparse attention mechanism, can be in a number of minutes ...

Latest AI Resources

5mos ago

051K

DreamOmni2 - 港科大开源的多模态AI图像编辑与生成模型

DreamOmni2 - HKUST open source multimodal AI image editing and generation models

DreamOmni2 is a multimodal AI image editing and generation model open-sourced by Jiajia's team at HKUST. Can handle both text and image commands, supports multiple reference images, providing creators with more flexible ways of creation. The model is trained using a three-stage data synthesis process , joint training generation/editing...

Latest AI Resources

6mos ago

035.8K

混元世界模型1.1 - 腾讯混元发布的开源3D重建大模型

Mixed World Model 1.1 - Tencent Mixed World Released Open Source 3D Reconstructed Large Model

WorldMirror 1.1 (WorldMirror) is an open source 3D reconstruction of large models released by Tencent's WorldMirror team, which is an upgraded version of the WorldMirror series. It supports multi-view images, videos, and multi-modal a priori inputs such as camera position, internal reference, depth map, etc. It breaks through the traditional 3D reconstruction that only relies on...

Latest AI Resources

6mos ago

035K

DeepSeek-OCR - DeepSeek开源的光学字符识别模型

DeepSeek-OCR - DeepSeek open source optical character recognition model

DeepSeek-OCR is an advanced optical character recognition (OCR) model open-sourced by the DeepSeek team, which converts text into images through "contextual optical compression" technology, and utilizes visual tokens for compression and decoding to achieve efficient long text processing.

Latest AI Resources

6mos ago

040.2K

VitaBench - 美团LongCat开源的交互式Agent评测基准

VitaBench - MMT LongCat Open Source Interactive Agent Review Benchmarks

VitaBench is the first interactive Agent evaluation benchmark for complex life scenarios released by the LongCat team of Meituan, assessing the comprehensive capabilities of large model intelligences in real life scenarios. The three high-frequency life scenarios of take-away ordering, restaurant dining, and traveling are used as the carrier to build the package...

Latest AI Resources

6mos ago

031.8K

MinerU2.5 - 上海AI Lab联合北大开源的文档解析模型

MinerU2.5 - Shanghai AI Lab and Peking University open source document parsing model

MinerU2.5 is a decoupled visual language model jointly developed by Shanghai Artificial Intelligence Laboratory (AIL) and Peking University, focusing on efficiently processing high-resolution document image parsing. The core innovation lies in the two-phase design of "global layout detection followed by local content recognition": the first phase is a low-resolution...

Latest AI Resources

6mos ago

045.8K

LongCat-Audio-Codec - 美团LongCat开源的语音编解码方案

LongCat-Audio-Codec - LongCat open source voice codec solution for Meituan

LongCat-Audio-Codec is an open source speech codec solution from the LongCat team of Meituan. The program is designed for Speech Large Language Model (Speech LLM), through the semantic and acoustic dual Token parallel extraction mechanism , taking into account the semantic and acoustic features of speech ...

Latest AI Resources

6mos ago

029.6K

PaddleOCR-VL - 百度开源的超轻量级视觉-语言模型

PaddleOCR-VL - Baidu open source ultra-lightweight visual-linguistic models

PaddleOCR-VL is Baidu's open source ultra-lightweight visual-language model, optimized for document parsing scenarios. The model contains only 0.9B parameters , through the fusion of dynamic high-resolution visual coder and lightweight ERNIE language model , while maintaining high accuracy and significantly reduce the computational overhead .

Latest AI Resources

6mos ago

046.4K

UniPixel - 香港理工、腾讯、中科院等开源的像素级多模态模型

UniPixel - Pixel-level multimodal model open-sourced by Hong Kong Polytechnic, Tencent, Chinese Academy of Sciences and others

UniPixel is a novel multimodal model jointly proposed by Hong Kong Polytechnic University, Tencent, Chinese Academy of Sciences and Vivo to achieve pixel-level visual language understanding. By unifying object referencing and segmentation capabilities, it supports a variety of fine-grained tasks such as image segmentation, video segmentation, region understanding, and pi...

Latest AI Resources

6mos ago

035.1K

DiaMoE-TTS - 清华联合巨人网络开源的多方言语音合成框架

DiaMoE-TTS - Tsinghua and Giant Networks open source multi-dialect speech synthesis framework

DiaMoE-TTS is a multi-dialect speech synthesis framework jointly open-sourced by Tsinghua University and Giant Network, based on the International Phonetic Alphabet (IPA), to solve the problems of dialect data scarcity, orthographic inconsistency, and complex phonological changes. Through a unified IPA front-end standardized phoneme representation to eliminate cross-dialect differences ...

Latest AI Resources

6mos ago

037.1K

Kandinsky 5.0 - 俄罗斯AI团队开源的视频生成模型系列

Kandinsky 5.0 - Russian AI Team's Open Source Video Generation Model Series

Kandinsky 5.0 is the latest video generation model series developed by Russian AI team, focusing on lightweight design and high performance performance. The first model in the series, Kandinsky 5.0 Video Lite, has only 2 billion parameters but surpasses similar 14B models, especially...

Latest AI Resources

6mos ago

045.2K

SongBloom - 腾讯联合港中文、南大开源的歌曲生成模型

SongBloom - Tencent's open source song generation model with HKCNU and NTU.

SongBloom is an open source song generation model developed by Tencent AI Lab in collaboration with The Chinese University of Hong Kong (Shenzhen) and Nanjing University, which solves the problem of "plasticity" in AI music generation, and realizes high-quality, structurally complete song generation. Simply enter 10 seconds of reference audio and corresponding lyrics, and you can...

Latest AI Resources

6mos ago

036K

Pyscn - 专为Python开发者开源的免费AI代码质量分析工具

Pyscn - Free AI code quality analysis tool open-sourced specifically for Python developers

Pyscn is an intelligent code quality analysis tool designed for Python developers to detect potential problems in code to improve maintainability. It analyzes dead code through control flow diagrams, identifies duplicate code using APTED+LSH algorithm, calculates metrics such as module coupling and circle complexity...

Latest AI Resources

6mos ago

028.7K

Youtu-Embedding - 腾讯优图开源的通用文本表示模型

Youtu-Embedding - Tencent Youtu open source generalized text representation model

Youtu-Embedding is a generalized text representation model open-sourced by Tencent's Youtu Lab, designed for enterprise-level applications. Through deep neural networks to map the text to a high-dimensional vector space, so that semantically similar sentences are closer in that space, to achieve accurate semantic retrieval.

Latest AI Resources

6mos ago

034K

SAIL-VL2 - 字节跳动开源的多模态视觉语言模型

SAIL-VL2 - ByteHop's open source multimodal visual language model

SAIL-VL2 is an open source multimodal visual language model by the Byte Jump team, focusing on joint modeling of multimodal inputs such as images and text. Using the sparse mixture of experts (MoE) architecture and progressive training strategy, it achieves high performance at parameter scales from 2B to 8B, especially in the areas of graphic comprehension, math...

Latest AI Resources

6mos ago

027.1K

MineContext - 字节开源的主动式上下文感知AI伙伴

MineContext - Bytes Open Source Active Context-Aware AI Partner

MineContext is an active context-aware AI partner open-sourced by the ByteDance Viking team to help users efficiently manage massive amounts of information and improve the efficiency of knowledge work. Over the screenshot and content understanding technology, automatically record the user's daily operations (such as browsing the web, editing documents, etc.), support...

Latest AI Resources

6mos ago

048K

nanochat - Karpathy免费开源的低成本模型训练项目

nanochat - Karpathy's free and open source low-cost model training program

nanochat is an open source project released by AI legend and former Tesla AI Director Andrej Karpathy that allows individuals to quickly train a small ChatGPT-like language model at a very low cost and simplicity. The entire project uses only about 800...

Latest AI Resources

6mos ago

033.6K

LLaVA-OneVision-1.5 - 免费开源的多模态模型，高性能多模态理解

LLaVA-OneVision-1.5 - Free and open source multimodal modeling, high performance multimodal understanding

LLaVA-OneVision-1.5 is an open-source multimodal model by the EvolvingLMMS-Lab team, using 8B parameter scale, through a compact three-phase training process (language-image alignment, conceptual equalization and knowledge injection, and instruction fine-tuning) on 128 A800...

Latest AI Resources

6mos ago

032K

Paper2Video - 新加坡国立开源的学术论文自动生成演示视频项目

Paper2Video - NUS open source project to automatically generate demo videos for academic papers

Paper2Video is an open-source presentation video project for automatic generation of academic papers by Show Lab at National University of Singapore. Using the PaperTalker multi-intelligence framework, papers are transformed into full presentation videos containing slides, subtitles, voiceover and speaker avatar...

Latest AI Resources

6mos ago

034.4K

NeuTTS Air - 支持离线CPU运行的免费轻量级语音合成模型

NeuTTS Air - Free and Lightweight Speech Synthesis Model with Offline CPU Running Support

NeuTTS Air is open source lightweight speech synthesis model, developed by Neuphonic team, which can run in real time on local devices (e.g. cell phones, laptops, Raspberry Pi) without relying on the cloud. Using 0.5B parameter Qwen architecture and self-developed NeuCodec codec...

Latest AI Resources

6mos ago

040.4K

KAT-Dev-72B-Exp - 快手开源的免费编程专用模型

KAT-Dev-72B-Exp - Racer open source free programming-specific models

KAT-Dev-72B-Exp is an open-source programming-specific large language model launched by the Racer team, optimized based on reinforcement learning technology, which achieved an accuracy rate of 74.6% in the SWE-Bench Verified benchmark test, the best performance of any open-source model at present. The model uses innovative...

Latest AI Resources

6mos ago

031.5K

Jamba Reasoning 3B - 以色列AI21 Labs开源的轻量级推理模型

Jamba Reasoning 3B - Israel AI21 Labs open source lightweight reasoning model

Jamba Reasoning 3B is a lightweight inference model open-sourced by Israeli AI startup AI21 Labs with strong performance and potential for a wide range of applications. It utilizes a hybrid SSM-Transformer architecture that combines Trans...

Latest AI Resources

6mos ago

028.9K

吴恩达的《Agentic AI》最新智能体免费课程

Free Course on the Latest Intelligentsia from Agentic AI by Ernest Ng

Agentic AI is the newest course on intelligent bodies launched by Ernest Ng.The course focuses on the design and construction of intelligent bodies, covering the four major design patterns of reflection, tool use, planning, and multi-intelligent body collaboration. Learners will master how to make intelligent bodies check outputs, autonomously adjust through theoretical explanations and code practice...

Latest AI Resources Course materials

6mos ago

053.9K

OpenAgents - 开源免费的构建AI Agent网络开放协作项目

OpenAgents - Open Source Free Open Collaboration Project for Building AI Agent Networks

OpenAgents is the open source project that creates a network of AI agents and facilitates open collaboration between agents. A basic network infrastructure is provided to enable AI agents to seamlessly connect and collaborate. Users can quickly start their own agent network, extend functionality through a modular architecture, support...

Latest AI Resources

6mos ago

030.8K

Androidify - 谷歌开源如何在Android上构建AI应用的免费资源

Androidify - Google open sources free resources on how to build AI apps on Android

Androidify is Google's open source project to help developers learn how to build AI-driven apps on Android. The project uses Google's latest technologies such as Jetpack Compose, Gemini API (via Fire...

Latest AI Resources

6mos ago

032K

Ling-1T - 蚂蚁集团开源的万亿参数通用语言模型

Ling-1T - Ant Group's open source universal language model for trillions of parameters

Ling-1T is a trillion-parameter general-purpose language model open-sourced by Ant Group, which belongs to the flagship product of the Ling 2.0 series of Bering's large models. The model adopts a highly efficient MoE architecture, supports 128K context windows, and surpasses GPT in 7 benchmarks including code generation, mathematical reasoning, and logic test...

Latest AI Resources

6mos ago

056.7K