Multimodal real-time interactive products

Total 27 articles posts

Sorting

RealtimeVoiceChat: low-latency natural spoken conversation with AI

General Introduction RealtimeVoiceChat is an open source project focused on real-time, natural conversations with artificial intelligence via voice. Users use a microphone to input their voice, and the system captures the audio through a browser, quickly converts it to text, and a large-scale language model (LLM) generates back...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

11mos ago

085.7K

Stepsailor: Integrating AI Command Bars in Existing SaaS Offerings

Stepsailor is a tool for developers with an AI command bar at its core. Developers can use it to make their software products understand what the user says, for example, the user says "add new task" and the software automatically executes it. It integrates with a simple SDK to...

Latest AI Resources # Professional Productivity Tools # Multimodal Real-Time Interactive Products

1yrs ago

051.1K

OpenAvatarChat: a modularly designed digital human conversation tool

General Introduction OpenAvatarChat is an open source project developed by the HumanAIGC-Engineering team and hosted on GitHub. It is a modular digital human conversation tool that allows users to run on a single PC...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

073.5K

VideoMind: video by timestamp positioning content and Q&A open source project

General Introduction VideoMind is an open source multimodal AI tool focused on inference, Q&A and summary generation for long videos. It was developed by Ye Liu of the Hong Kong Polytechnic University and a team from Show Lab at the National University of Singapore. The tool mimics human understanding of video...

Latest AI Resources # AI Java Open Source Projecct # AI Text and Audio/Video Summarization Tool # AI audio/video editor

10mos ago

058.9K

MoshiVis: an open source model for real-time speech dialog and image understanding

General Introduction MoshiVis is an open source project developed by Kyutai Labs and hosted on GitHub. It is based on the Moshi speech-to-text model (7B parameters), with about 206 million new adaptation parameters and frozen Pal...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

057.5K

Qwen2.5-Omni: an end-measurement model for multimodal input and real-time speech interaction

Comprehensive Introduction Qwen2.5-Omni is an open source multimodal AI model developed by Alibaba Cloud Qwen team. It can process multiple inputs such as text, images, audio and video, and generate text or natural speech responses in real time. The model was released in 2025 on 3 ...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

070.2K

xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end services

General Introduction xiaozhi-esp32-server is a tool to provide backend service for Xiaozhi AI chatbot (xiaozhi-esp32). It is written in Python and based on the WebSocket protocol to help users quickly...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

0169.6K

Baichuan-Audio: an end-to-end audio model supporting real-time voice interaction

Comprehensive Introduction Baichuan-Audio is an open source project developed by Baichuan Intelligence (baichuan-inc), hosted on GitHub, focusing on end-to-end voice interaction technology. The project provides a complete audio processing framework that enables speech ...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

069.7K

PowerAgents: AI Intelligent Body Platform for Timing Web Tasks

General Introduction PowerAgents is an AI intelligences platform focused on web automation tasks, which allows users to create and deploy AI intelligences capable of clicking, entering and extracting data. The platform supports setting tasks to run automatically on an hourly, daily or weekly basis, and users can also watch real-time...

Latest AI Resources # Multimodal Real-Time Interactive Products

1yrs ago

055.7K

Step-Audio: a multimodal voice interaction framework that recognizes speech and communicates using cloned speech, among other features

Comprehensive Introduction Step-Audio is an open source intelligent speech interaction framework designed to provide out-of-the-box speech understanding and generation capabilities for production environments. The framework supports multi-language dialog (e.g., Chinese, English, Japanese), emotional speech (e.g., happy, sad), regional dialects (e.g., Cantonese, Szechuan ...

Latest AI Resources # AI Java Open Source Projecct # AI voice cloning # Multimodal Real-Time Interactive Products

1yrs ago

074.8K

Gemini Cursor：基于Gemini构建的AI桌面智能助手，能看、能听、能说

Gemini Cursor: an AI desktop smart assistant built on Gemini that can see, hear and speak

General Introduction Gemini Cursor is a desktop intelligent assistant based on Google's Gemini 2.0 Flash (experimental) model. It enables visual, auditory, and voice interactions through a multimodal API, providing real-time low-latency use...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

065.7K

DeepSeek-VL2: an expert visual language model for advanced multimodal understanding

Comprehensive Introduction DeepSeek-VL2 is a series of advanced Mixture-of-Experts (MoE) visual language models that significantly improve the performance of its predecessor, DeepSeek-VL. The models are useful in visual question and answer, optical character recognition, text...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

078.5K

AI Web Operator：浏览器自动化操作，OpenAI Operator的开源实现

AI Web Operator: Browser Automation, an Open Source Implementation of OpenAI Operator

General Introduction AI Web Operator is an open source AI browser operator tool designed to simplify the user experience in the browser by integrating multiple AI technologies and SDKs. The tool is based on Browserbase and Vercel...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

056.3K

SpeechGPT 2.0-preview: an end-to-end anthropomorphic speech dialog grand model for real-time interaction

SpeechGPT 2.0-preview is the first anthropomorphic real-time interaction system introduced by OpenMOSS, which is trained based on millions of hours of speech data. The system is equipped with anthropomorphic spoken expression and 100ms low latency response, supporting natural and smooth real...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

054.2K

OpenAI Realtime Agents：多智能体语音交互应用（OpenAI示例）

OpenAI Realtime Agents: A Multi-Intelligent Body Speech Interaction Application (OpenAI Example)

General Introduction OpenAI Realtime Agents is an open source project that aims to show how OpenAI's realtime API can be utilized to build multi-intelligent body speech applications. It provides a high-level intelligent body model (borrowed from OpenAI Swarm) that allows...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

054.2K

Bailing: a low-latency open source voice dialog assistant that easily realizes natural conversational exchanges

Comprehensive Introduction Bailing (Bailing) is an open source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM) and speech synthesis (TTS) technologies to achieve...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

072K

Weebo: a real-time voice chatbot that provides a natural language conversational experience

General Introduction Weebo is an open source real-time voice chatbot that utilizes Whisper Small for speech recognition, Llama 3.2 for natural language generation, and Kokoro-82M for speech synthesis. The project was developed by Aman...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

064.7K

OmAgent: an intelligent body framework for building multimodal smart devices

Comprehensive Introduction OmAgent is a multimodal intelligent body framework developed by Om AI Lab, aiming to provide powerful AI-powered features for smart devices. By integrating state-of-the-art multimodal base models and intelligent body algorithms, the project enables developers to create efficient smart devices on a variety of...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products # Intelligent Body Development Framework

1yrs ago

057.7K

"Always-On" Deepseek AI Assistant: Building an Intelligent Voice Interaction System Based on Deepseek-V3

Comprehensive Introduction Always-On AI Assistant is an innovative AI assistant project that creates a powerful, permanently online AI assistant system by integrating advanced technologies such as Deepseek-V3, RealtimeSTT and Typer...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

062.1K

BrownChat: open source real-time voice chat AI assistant

General Introduction BrownChat is a real-time audio chat application based on Large Language Modeling (LLM) technology. Developed by GitHub user sugarforever, the project aims to enhance the user's communication experience through advanced natural language processing technology.B...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

057.4K

Xiaozhi AI Chatbot: Build your AI chatting companion, easily realize voice conversation and intelligent interaction

Comprehensive Introduction Xiaozhi AI Chatbot is an open source project based on the ESP32 development board, designed to help users build their own AI chat companion. The project was developed by Shrimp and is mainly used for teaching purposes to help more people get started with AI hardware development and to understand how to apply large language models to real...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

0221.5K

OpenAI Realtime API Next.js：构建实时语音对话AI应用的Next.js模板

OpenAI Realtime API Next.js: a Next.js template for building real-time voice conversation AI applications

Comprehensive introduction OpenAI Realtime API Next.js is an open source project based on the Next.js framework , designed to help developers quickly build real-time voice AI applications . The project integrates OpenAI's real-time API and WebRTC technology...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

050.8K

VITA: Open Source Multimodal Large Language Model for Real-Time Interaction between Vision and Speech

General Introduction VITA is a leading open source interactive multimodal large language modeling project, pioneering the ability to achieve true full multimodal interaction. The project launched VITA-1.0 in August 2024, pioneering the first open source interactive fully-modal large language model.2024...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

091.6K

TransRouter: A Real-Time Audio Conversion Tool for Chinese-to-English Translation Based on Gemini Multimodal Modeling

TransRouter is a real-time voice translation tool based on Google's Gemini model, specifically designed for real-time voice translation between English and Chinese. The tool can be seamlessly integrated into video conferencing software such as Zoom, providing an easy way for cross-language...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

059.5K

Fish Agent：端到端AI语音克隆助手，实时语音对话助理，Fish Speech衍生项目

Fish Agent: end-to-end AI voice cloning assistant, real-time voice conversation assistant, Fish Speech spin-off project

Comprehensive Introduction Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on the V0.1 3B model architecture. As a fully end-to-end speech clone processing system, its most important feature is the use of innovative speechless...

Latest AI Resources # AI Java Open Source Projecct # AI voice cloning # Multimodal Real-Time Interactive Products

1yrs ago

072.2K

Megrez-3B-Omni：端侧多模态理解模型，支持文本、图像、音频多模态理解和分析

Megrez-3B-Omni: an end-side multimodal understanding model supporting text, image, and audio multimodal understanding and analysis

Comprehensive Introduction Infini-Megrez is an edge intelligence solution developed by the unquestioned core dome (Infinigence AI), aiming to achieve efficient multimodal understanding and analysis through hardware and software co-design. At the core of the project is the Megrez-3B model, which supports graph...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

043.8K

Ichigo (llama3-s): local real-time voice AI assistant, open source version of Siri

General Introduction Ichigo is an open source real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo's goal is to become...

Latest AI Resources # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

1yrs ago

062.2K

No more