FLM-Audio - Wisdom Source and Nanyang Polytechnic Open Source Full-Duplex Audio Dialog Modeling

Latest AI Resources7mos agorelease AI Sharing Circle

39.1K 00

What is FLM-Audio

FLM-Audio is a native full-duplex audio dialog grand model released by Beijing Zhiyuan Artificial Intelligence Research Institute in conjunction with Spin Matrix and Nanyang Technological University of Singapore, supporting both Chinese and English. Adopting native full-duplex architecture, it can merge the listening, speaking and monologue channels at each time step, avoiding the high latency problem of traditional time-division multiplexing scheme. The unique natural monologue and dual training paradigm makes the model closer to the natural way of human communication in dialogs, effectively solving the asynchronous alignment problem.FLM-Audio is trained with only 1 million hours of data, which dramatically reduces the amount of data, and provides high quality responses with agile and natural responses, as well as strong robustness to noise and user interruptions.

Features of FLM-Audio

Native Full-Duplex Architecture: The ability to simultaneously listen, speak and internal monologue enables low-latency, full-duplex conversations that are closer to natural human communication.
Support for English and Chinese dialog: With the ability to talk in both Chinese and English to meet the needs of multilingual users.
Efficient data utilization: Trained with only 1 million hours of data, the data is small but the response quality is high, and the response is agile and natural.
high robustness: Highly adaptable to noise and user interruptions, quickly adjusting the dialog to ensure smoothness.
open source can be studied: The model and code are open source to facilitate research and exploration by researchers and developers.

FLM-Audio's Core Advantages

Low latency full duplex dialog: With native full-duplex architecture, FLM-Audio is capable of listening, speaking and inner monologuing at the same time, realizing low-latency full-duplex conversations for smoother and more natural communication, close to the real human conversation experience.
Efficient data training: The model is trained using only 1 million hours of data, which is a significant reduction in data volume compared to other similar models, and still delivers high-quality dialog replies with agile and natural response patterns and more efficient training.
high robustness: It is robust to noise and user interruptions, can quickly pause the current output, accurately understand new questions and answer them instantly, ensures smooth and accurate dialog, and adapts to a variety of complex dialog scenarios.
Natural Monologue and the Dual Training ParadigmThe concept of "natural monologue" is introduced to mimic the cognitive behaviors in human conversations, and a "dual training paradigm" is adopted to effectively solve the asynchronous alignment problem, so as to make the model's conversations more natural and coherent.

What is FLM-Audio's official website?

GitHub repository:: https://github.com/cofe-ai/flm-audio
HuggingFace Model Library:: https://huggingface.co/CofeAI/FLM-Audio
arXiv Technical Paper:: https://arxiv.org/pdf/2509.02521

Who is FLM-Audio for?

research worker: The open-source nature of FLM-Audio makes it an ideal tool for researchers in the fields of artificial intelligence, natural language processing, and speech technology to explore cutting-edge topics such as full-duplex dialogue techniques, model optimization, and multimodal interaction.
developers: For software developers, FLM-Audio provides rich interfaces and flexible customization options for developing intelligent voice assistants, chatbots, voice interaction applications, etc., accelerating product development and innovation.
business user: Enterprises can utilize FLM-Audio to enhance the customer service experience, such as developing an intelligent customer service system that enables more efficient and natural customer interactions, improving customer satisfaction and operational efficiency.
educator: In education, FLM-Audio can be used to develop language learning tools, intelligent tutoring systems, etc., providing students with a more interactive and personalized learning experience through full-duplex dialogue.
content creator: Content creators can use FLM-Audio to generate creative dialogues, audio content or scripts to improve creative efficiency and inspire new creations.