Meta Unveils SAM Audio for Sound Separation

Meta's SAM Audio simplifies sound separation with multimodal prompts.
Published: January 3, 2026

Introducing SAM Audio

Meta has launched SAM Audio, a unified model for isolating sounds from complex audio through natural language, visual, or time prompts. Released on December 16, 2025, this tool positions Meta as a leader in audio separation technology, boosting usability for both creators and developers.

Advancements in Audio Separation

SAM Audio combines multiple interaction methods for extracting specific sounds such as speech and instruments. Built on flow-matching diffusion transformers, it surpasses existing models. The Perception Encoder Audiovisual (PE-AV) drives its functionality, offering high-quality separation. Explore SAM Audio on Meta's Segment Anything Playground.

Features and Capabilities

SAM Audio allows users to isolate sounds through text prompts like “dog barking,” visual selection in videos, and span prompts for time-marked extraction. Designed for practical tasks like noise removal, it operates in mono at around 0.7x real-time speed on A100 GPUs, and struggles with similar audio events.

Access and Availability

SAM Audio and PE-AV are currently available for free download. Details on commercial licensing and API access are not yet provided, positioning the tool primarily for research purposes.

Challenges and Future Direction

The model's unique approach faces scrutiny. Claims of being the first unified model are contested, and potential for misuse remains. LALAL.AI highlights issues like audio artifacts and lack of stereo fidelity. Meta plans collaborations, such as with Starkey, to enhance audio separation tech, influencing audio editing and creative media tools. Full insights can be found in Meta's official announcement.