Computer Science Seminar: Prem Seetharaman
This event is open to Illinois Tech faculty and students.
Abstract
Computer audition is the study of how machines can organize, parse, and understand the sounds we hear every day. A fundamental problem in computer audition is audio source separation. Source separation is the isolation of a specific sound (e.g. a single speaker) in a complex audio scene, like a cocktail party. Humans—as evidenced by our daily experience with sounds, as well as empirical studies—manage the source separation task quite effectively, attending to sources of interest in complex scenes. In this talk, Seetharaman will present computational methods for audio source separation that are inspired by the abilities of humans.
Deep-learning approaches are currently state-of-the-art for source separation tasks. They are typically trained on many examples where each source (e.g. a voice) was recorded in isolation. The sources are then artificially mixed together to create training data for a deep-learning model. This artificial training set is at odds with how humans learn to separate sounds. We are never given sounds in isolation, but rather hear them always in the context of other sounds. Further, while we can train models to separate sounds for which we have sufficient isolated source data (e.g. speech), we cannot for the many sounds where do not have isolated recordings. However, we do have vast datasets of complex audio mixtures (e.g. YouTube, Spotify). How do we learn computer audition models directly from these mixtures, rather than from artificial training data? Seetharaman will present work on building self-supervised machine learning models that learn to perform audio source separation directly from audio mixtures. These models are bootstrapped from separation algorithms that are inspired by the primitive grouping mechanisms used in human audition.
Bio
Seetharaman is a postdoctoral scholar at Northwestern University in Evanston, Illinois. He received his Ph.D. in 2019, advised by Bryan Pardo from Northwestern. Prior to his Ph.D., he studied computer science and music composition at Northwestern. He has worked with Spotify, Adobe Research, Mitsubishi Electric Research Labs, and Gracenote on advancing progress in computer audition. The objective of his research is to create machines that can understand the auditory world. He works at the intersection of computer audition, machine learning, and human computer interaction.