SAM Audio - Open Source Multimodal Audio Segmentation Model from Meta

Latest AI Resources4mos agorelease AI Sharing Circle

31.8K 00

What is SAM Audio

SAM Audio is an open source from MetaMultimodal Audio Segmentation Model, accurately separates arbitrary target sounds from complex audio mixes. By combining textual, visual and temporal cues, it enables flexible and efficient audio processing, providing a new solution for audio editing, denoising, sound extraction and other tasks. Users can use SAM Audio with simple text descriptions (e.g., "guitar sound"), by clicking on a sounding object in a video, or by marking the time frame in which the sound occurs.

Features of SAM Audio

Multi-modal cueing support::
- text alert: Users can extract the corresponding sounds from natural language descriptions (e.g., "barking dog", "human voice").
- visual cue: Click on a vocalized object (e.g., musical instrument, speaker) in a video to automatically separate its audio.
- Time Span Tips: Marks the time period in which the target sound appears, enabling precise localization of the separation.
unified model architecture: No need to train separately for different sound categories, can be applied directly to new tasks based on cues, with strong generalization and extensibility.
High performance and efficiency: Outperforms existing models in a wide range of audio separation tasks, runs at near real-time processing speeds (real-time factor of ~0.7), and supports large-scale audio processing.
Wide range of application scenarios: For audio cleanup, background noise removal, music production, video post-processing, accessibility technology, and other areas, lowering the bar for professional audio processing.

SAM Audio's Core Advantages

multimodal interaction: Supports a variety of cueing methods such as text, visual and time snippets, which users can choose flexibly according to their needs, and is closer to the way audio is naturally understood and processed.
Industry-leading performance: Achieves leading performance on a wide range of audio separation tasks, including voice, music and general purpose sound separation, capable of handling complex audio mixes.
No Reference Audio ReviewSAM Audio Judge provides an objective assessment of audio quality without the need for a reference track, which is closer to the human listening experience.
Efficient real-time processing: Runs faster than real-time processing (real-time factor of about 0.7), which is suitable for large-scale audio processing and improves work efficiency.
Real Environment Benchmarking: Evaluated by SAM Audio-Bench, covering a wide range of audio tasks in real scenarios to ensure the reliability and validity of the model in real applications.
Open Source and Community Support: The code is open source to facilitate further exploration and application by developers and researchers, and to promote the development of audio processing technology.

What is SAM Audio's official website?

Project website:: https://ai.meta.com/samaudio/
Github repository:: https://github.com/facebookresearch/sam-audio

Who SAM Audio is for

Audio editors: Professional audio editors who need to clean up audio, remove background noise, or perform audio restoration.
Creative Media Creators: Includes music producers, video editors and content creators for audio creativity and remixing.
research worker: Researchers working in the fields of audio analysis, sound ecology, or music information retrieval.
Hearing aid developers: Work with hearing aid manufacturers to develop more effective hearing assistive technology for the hearing impaired.
regular user: Users who want to improve the quality of their personal audio content, or who need simple audio processing in their daily lives.