CNMAT OpenLab, focused on generative AI in audio. Space is limited to 30 attendees. Fill out this quick form to register.
What: A full-day event with talks on the latest research in audio generation, sound design, and AI-assisted composition.
Who: Featuring guest speakers from Meta, NVIDIA, Adobe Research, MIT Media Lab, and Cai.audio (more info will appear on CNMAT's website soon)
When: Friday, August 29, 2025, from 10:00 AM to 6:30 PM.
Where: CNMAT (McEnerney Hall, 1750 Arch St, Berkeley, CA)
Includes: Coffee, light lunch, BBQ dinner.
Schedule:
Breakfast (9:30): Coffee + croissants
Speaker: Carmine-Emanuele Cella (CNMAT/UC Berkeley)
Speaker: Rafael Valle (Meta)
Talk: Large Scale Non-Autoregressive Audio Generative Models
Q&A: +15 minutes
Abstract:
Audio generation technologies have advanced rapidly over the past three years. In addition to having higher quality, they have also become much more universal and controllable in terms of the variety of audio they could generate and the variety of input one may use for prompting. In this talk, I will discuss the keys that led to the breakthrough, and also dive into four recent works from META’s AudioBox: Voicebox, SpeechFlow, Audiobox, and Movie Gen Audio. These works span speech generation, self-supervised pre-training for diffusion-style models, general audio generation, and video-conditioned audio generation.
Speakers: Sang-gil Lee & Zhifeng Kong (NVIDIA)
Talk: Towards Intelligence in Audio Understanding and Generation @ NVIDIA
Q&A: +15 minutes
Abstract:
This talk presents a research perspective on audio understanding and generation from NVIDIA’s applied deep learning research, audio general intelligence team (ADLR-AGI). Selected publications include:
Fugatto 1: Foundational Generative Audio Transformer Opus 1 (ICLR 2025) – link
ETTA: Elucidating the Design Space of Text-to-Audio Models (ICML 2025) – link
Audio Flamingo: Series of Advanced Audio Understanding Language Models (ICML 2024 / ICML 2025 / NeurIPS 2025 under review) – link
OMCAT: Omni Context Aware Transformer – link
BigVGAN-v2: State-of-the-art waveform synthesizer – link
A²-Flow: Alignment-Aware Pre-training for Speech Synthesis with Flow Matching
A2SB: Audio-to-Audio Schrödinger Bridges – link
(Provided for speakers)
Speaker: Oriol Nieto (Adobe Research)
Talk: GenAI for Sound Design
Q&A: +15 minutes
Abstract:
This presentation explores the forefront of generative AI research for sound design at Adobe Research. We will provide an overview of Latent Diffusion Models and introduce several recent advancements focused on controllability and multimodality:
SILA: Enhances control of sound effects generated through text prompts.
Sketch2Sound: Generates sound effects conditioned on both audio recordings and text.
MultiFoley: Generates sound effects from silent videos and text.
Throughout the talk, we will showcase examples and demos to illustrate practical applications and potential, making the case that we are only beginning to unveil a new paradigm for sound design.
Speaker: Char Stiles (MIT MediaLab – Future Sketches)
Talk: If programming is the native language of our digital age, does that make computation the artistic material of our time?
Q&A: +15 minutes
Abstract:
This research/artist talk follows work from coding GPU shaders to creating a new creative coding integrated development environment, to live coding music. Through performance and audience interaction, Stiles explores how the tools we use to create code fundamentally shape what we can imagine and express. From Fantasy IDEs that chat with audiences to version control systems that let you softly interpolate through code, this work treats programming environments not as neutral productivity tools, but as expressive mediums in their own right.
Coffee Break + Icos/Dodec Demo
Speaker: Reuel Williams (Cai.audio)
Talk: Cai: A New Framework for Preserving Artistic Intentionality
Q&A: +15 minutes
Abstract:
Diffusion-based audio generative models allow for complete musical compositions to be generated with simple prompts, forcing musicians and artists alike to reckon with their future relevance as composers. These models are often perceived as replacers of the creative process, creating backlash from musicians and challenges for integration into the music community.
Cai presents a framework for symbolic music generation, focusing on harmonic and melodic structure via MIDI. Diffusion-based models play a secondary role to control timbre, articulation, and phrasing. This preserves human artistic intentionality while benefiting from the democratization promised by generative AI.
Session: CNMAT latest research + discussion
Q&A: +15 minutes
(Provided: BBQ)