AI Personal Learning
and practical guidance
Beanbag Marscode1
Total 908 articles

Tags: ai open source projects Page 32

Ruyi-Models:生成图像到视频开源模型,支持镜头控制与运动幅度控制-首席AI分享圈

Ruyi-Models: generate image to video open source models, support lens control and motion amplitude control

General Introduction Ruyi-Models is an open source project designed to generate high quality videos from images. Developed by the IamCreateAI team, the project supports the generation of cinematic video at 768 resolution, 24 frames per second, totaling 120 frames in 5 seconds.Ruyi-Models supports lens control and motion amplitude control ...

Robo Blogger:基于LangGraph将语音内容生成博客文章,自动化写作博客-首席AI分享圈

Robo Blogger: generating blog posts from voice content based on LangGraph, automated writing blogs

General Introduction Robo Blogger is an innovative blog creation tool designed to simplify the content generation process through speech-to-text technology. Users can record ideas through any speech-to-text application and Robo Blogger transforms those ideas into structured blog content. The tool utilizes LangChain ...

Genesis:开源生成式物理引擎,实现基于真实物理的4D动态世界模拟-首席AI分享圈

Genesis: open source generative physics engine for real physics-based 4D dynamic world simulation

General Introduction Genesis is a generative physics world designed for general purpose robotics and embodied AI learning. It provides a unified simulation platform that supports the simulation of a wide range of materials and physical phenomena.Genesis aims to unlock an infinite variety of data by combining generative AI and physics simulation to help machine...

Kolors:生成高质量图像的文本到图像模型,支持生成中文海报-首席AI分享圈

Kolors: text-to-image model for generating high-quality images, support for generating Chinese posters

Comprehensive Introduction Kolors is a large-scale text-to-image generation model developed by the Racer team, based on potential diffusion techniques. The model is trained on billions of text-image data pairs, and is capable of generating high-quality, complex semantically accurate images with support for both Chinese and English inputs.Kolors is well known for its visual quality, complex semantic accuracy...

ColorFlow:漫画着色,黑白图像自动着色,提升图像色彩一致性和质量-首席AI分享圈

ColorFlow: Comic book coloring, automatic coloring of black and white images to improve image color consistency and quality

Comprehensive Introduction ColorFlow is an image sequence auto-coloring tool developed by Tencent's ARC team to solve the problem of auto-coloring black and white image sequences. The tool utilizes a retrieval-enhanced coloring pipeline to accurately generate the colors of various elements, including the character's hair color and clothing, from a pool of reference images, ensuring that the color...

Outlines:通过正则表达式、JSON或Pydantic模型生成结构化文本输出-首席AI分享圈

Outlines: Generate structured text output via regular expressions, JSON or Pydantic models

Comprehensive Introduction Outlines is an open source library developed by dottxt-ai to enhance the application of Large Language Models (LLMs) through structured text generation. The library supports a wide range of model integrations, including OpenAI, transformers, llama.cpp, etc. It provides simple but powerful cue primitives,...

R2R:多模态内容解析并结合知识图谱与混合搜索的先进AI检索(RAG)系统-首席AI分享圈

R2R: An Advanced AI Retrieval (RAG) System for Multimodal Content Parsing and Combining Knowledge Graph with Hybrid Search

Comprehensive Introduction R2R (RAG to Riches) is a state-of-the-art AI retrieval system supporting Retrieval Augmented Generation (RAG) functionality with production-ready features. Built on a containerized RESTful API, the system provides multimodal content parsing, hybrid search capabilities, configurable GraphRAG, and comprehensive...

Megrez-3B-Omni:端侧多模态理解模型,支持文本、图像、音频多模态理解和分析-首席AI分享圈

Megrez-3B-Omni: an end-side multimodal understanding model supporting text, image, and audio multimodal understanding and analysis

Comprehensive Introduction Infini-Megrez is an edge intelligence solution developed by the unquestioned core dome (Infinigence AI), aiming to achieve efficient multimodal understanding and analysis through hardware and software co-design. The core of the project is the Megrez-3B model, which supports integrated image, text and audio understanding with high accuracy...

RAGFlow:基于深度文档理解的开源RAG引擎,提供高效的检索增强生成工作流-首席AI分享圈

RAGFlow: an open source RAG engine based on deep document understanding, providing efficient retrieval-enhanced generation workflows

Comprehensive Introduction RAGFlow is an open source Retrieval Augmented Generation (RAG) engine based on deep document understanding technology. It provides an efficient RAG workflow for organizations of all sizes, incorporating a large-scale language model (LLM) capable of delivering real-world question-and-answer capabilities based on data in complex formats.RAGFlow...

CrewAI:多角色扮演协作智能框架,简化复杂任务-首席AI分享圈

CrewAI: A Multi-Roleplay Collaborative Intelligence Framework to Simplify Complex Tasks

General Introduction CrewAI is an advanced framework designed to orchestrate collaboration between role-playing and autonomous AI agents. By facilitating collaborative intelligence, CrewAI enables agents to work together seamlessly to solve complex tasks. Whether building intelligent assistant platforms, automating customer service teams, or multi-agent research teams, Crew...

Leffa:高保真模特虚拟试穿与人物姿势调整,Meta开源的可控人物图像生成模型-首席AI分享圈

Leffa: High-fidelity model virtual fitting and character pose adjustment, Meta open source controllable character image generation model

Comprehensive Introduction Leffa is a unified framework for generating controllable character images, enabling precise manipulation of character appearance (e.g., virtual fitting) and pose (e.g., pose transfer). The framework significantly reduces distortion of fine-grained details by directing the target query to focus on the correct reference key in the attention layer, while preserving...

MMAudio:为视频画面生成同步音效与配乐,视频到音频的多模态联合训练工具-首席AI分享圈

MMAudio: generating synchronized sound effects and soundtracks for video footage, video-to-audio multimodal co-training tool

General Introduction MMAudio is an open source project aiming at generating high-quality synchronized audio through joint multimodal training. Developed by Ho Kei Cheng et al. at the Chinese University of Hong Kong, the project's main function is to generate synchronized audio based on video and/or text input.The core innovation of MMAudio is...

en_USEnglish