Latest AI Resources

Total 3037 articles posts
zChunk:基于Llama-70B的通用语义分块策略

zChunk: a generic semantic chunking strategy based on Llama-70B

Comprehensive Introduction zChunk is a novel chunking strategy developed by ZeroEntropy that aims to provide a solution for generic semantic chunking. The strategy is based on the Llama-70B model, which optimizes the chunking process of documents by prompting for chunks to be generated, ensuring that information retrieval is maintained at a high...
1yrs ago
050.6K
Hibiki:实时语音翻译模型,保留原声特点的流式翻译

Hibiki: a real-time speech translation model, streaming translation that preserves the characteristics of the original voice

General Introduction Hibiki is a high-fidelity real-time speech translation model developed by Kyutai Labs. Unlike traditional offline translation, Hibiki is able to generate natural speech translation in the target language and provide text translation in real time while the user is speaking. The model...
1yrs ago
066.4K
Pulse:文档处理与数据提取的商业解决方案

Pulse: Business Solutions for Document Processing and Data Extraction

Comprehensive Introduction Pulse is an intelligent platform focused on document processing and data extraction, designed to help organizations and developers efficiently parse and process a wide range of complex documents. Through its advanced computer vision and multimodal processing technology, Pulse is able to accurately extract data from text, images, tables, and many other...
1yrs ago
053.9K
Agentic Security:开源的LLM漏洞扫描工具,提供全面的模糊测试和攻击技术

Agentic Security: open source LLM vulnerability scanning tool that provides comprehensive fuzz testing and attack techniques

General Introduction Agentic Security is an open source LLM (Large Language Model) vulnerability scanning tool designed to provide developers and security professionals with comprehensive fuzz testing and attack techniques. The tool supports customized rule sets or agent-based attacks and is able to integrate LLM AP...
1yrs ago
061.3K
CogVLM2:开源多模态模型,支持视频理解与多轮对话

CogVLM2: Open Source Multimodal Modeling with Support for Video Comprehension and Multi-Round Dialogue

Comprehensive Introduction CogVLM2 is an open source multimodal model developed by the Tsinghua University Data Mining Research Group (THUDM), based on the Llama3-8B architecture, and designed to provide performance comparable to or even better than GPT-4V. The model supports image understanding, multi-round dialogs, and visual ...
1yrs ago
063.6K
VisoMaster:强大且易用的图片/视频换脸和编辑软件

VisoMaster: Powerful and easy-to-use photo/video face changing and editing software

General Introduction VisoMaster is a powerful and easy-to-use video face-swapping and editing tool that utilizes artificial intelligence technology to achieve natural and realistic face-swapping effects. Whether it's an image or a video, VisoMaster can generate high-quality face swap results with simple operations, suitable for general...
1yrs ago
0172.1K
Rowfill:批量提取文档结构化信息并自动化分析

Rowfill: Batch Extraction of Structured Information from Documents and Automated Analysis

General Introduction Rowfill is an open source document processing platform designed for knowledge workers. It uses advanced artificial intelligence techniques to extract, analyze and process data from complex documents, images and PDFs.Rowfill supports Native Large Language Model (LLM) and Ope...
1yrs ago
053.9K
GPT Researcher:利用本地和网络数据,生成全面、详实的研究报告

GPT Researcher: Generate comprehensive, detailed research reports utilizing local and web-based data

Comprehensive Introduction GPT Researcher is an autonomous agent tool based on the Large Language Model (LLM) designed to perform local and web research and generate detailed research reports. The tool provides stable performance and faster speed by parallelizing agent work, ensuring that the information is accurate...
1yrs ago
050.9K
Linly-Talker:数字人智能对话系统,结合大语言模型与视觉模型,实现互动新体验

Linly-Talker: An Intelligent Dialogue System for Digital People, Combining Big Language Modeling and Visual Modeling for a New Interactive Experience

Comprehensive Introduction Linly-Talker is an innovative digital human dialog system that combines Large Language Models (LLMs) with visual models to create a novel approach to human-computer interaction. The system integrates a variety of technologies such as Whisper, Linly, Micros...
1yrs ago
088.9K
Botnow:AI 智能体创作与分发平台,助力智能营销与智慧办公

Botnow: AI Intelligent Body Creation and Distribution Platform for Smart Marketing and Smart Office

Comprehensive Introduction Botnow is a next-generation AI intelligences creation and distribution platform designed to help developers build high-quality intelligences quickly and with a low threshold through plugins, knowledge bases, and workflows. The platform supports publishing intelligences to third-party platforms and provides API tuning...
1yrs ago
052.4K
bilive:B站无人监守直播录制与自动切片、上传工具

bilive: Unsupervised live recording and automatic slicing and uploading tools for B station

Comprehensive Introduction bilive is a tool designed for B station live recording, providing extremely fast live recording, auto-slicing, pop-up rendering and subtitle generation. The tool is compatible with ultra-low configuration machines, supports 7x24 hours unattended recording, automatically recognizes and renders pop-ups and subtitles, automatically slices and...
1yrs ago
081.6K
CoT-Lab:探索人机协作迭代思考的实验性对话工具

CoT-Lab: an experimental dialog tool for exploring iterative thinking about human-computer collaboration

CoT-Lab is an experimental interface for exploring a new paradigm of human-computer collaboration. Based on Cognitive Load Theory and Active Learning Principles, CoT-Lab facilitates deep cognitive alignment between humans and Artificial Intelligence (AI) through the creation of "thinking partner" relationships. The program aims to...
1yrs ago
046.4K
SpeechGPT 2.0-preview:实时交互的端到端拟人语音对话大模型

SpeechGPT 2.0-preview: an end-to-end anthropomorphic speech dialog grand model for real-time interaction

SpeechGPT 2.0-preview is the first anthropomorphic real-time interaction system introduced by OpenMOSS, which is trained based on millions of hours of speech data. The system is equipped with anthropomorphic spoken expression and 100ms low latency response, supporting natural and smooth real...
1yrs ago
053.6K
Goose:开源可扩展的编程智能体,自动化执行编程全流程任务

Goose: open source scalable programming intelligences that automate the full range of programming tasks

General Introduction Goose is an open source AI agent tool developed by Block, Inc. designed to help developers automate everyday development tasks. It supports a wide range of Large Language Models (LLMs) and interacts with users via the command line or desktop application interfaces.Goose can perform a wide range of tasks from agent...
1yrs ago
083.4K
YuE:将歌词转化为完整歌曲的基础模型,支持多种音乐风格

YuE: Transforms lyrics into a base model of a complete song, supporting a wide range of musical styles

General Introduction YuE is an open source full song generation base model that focuses on transforming lyrics into full songs. Unlike other models that can only generate short snippets of non-vocal music, YuE is capable of generating full songs with lead and backing vocals up to several minutes in length. The model addresses music generation in...
1yrs ago
064.6K
Float:跨语言智能搜索引擎,用母语检索不同语言知识

Float: a cross-language intelligent search engine to retrieve knowledge in different languages in their native language

Comprehensive Introduction FloatSearch AI is a cross-language intelligent search engine based on artificial intelligence technology, designed to provide users with a more accurate and efficient search experience. It understands users' natural language queries and provides relevant and accurate answers based on semantic analysis.FloatS...
1yrs ago
053.9K
UltraRAG:一站式RAG系统解决方案,简化数据构建与模型微调

UltraRAG: A One-Stop RAG System Solution to Simplify Data Construction and Model Fine-Tuning

Comprehensive Introduction UltraRAG is a RAG (Retrieval Augmented Generation) system solution jointly proposed by the THUNLP group at Tsinghua University, the NEUIR group at Northeastern University, Modelbest.Inc and the 9#AISoft team. The framework is based on agile deployment and modularized building...
1yrs ago
066.9K
Llasa 1~8B:高品质语音生成和克隆的开源文本转语音模型

Llasa 1~8B: an open source text-to-speech model for high quality speech generation and cloning

General Introduction Llasa-3B is an open source text-to-speech (TTS) model developed by the Audio Lab of the Hong Kong University of Science and Technology (HKUST Audio). The model is based on the Llama 3.2B architecture, which has been carefully tuned to provide high-quality speech generation that not only supports multiple...
1yrs ago
074.9K