AI headphones: Listen to a person in a crowd just by looking at them

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Researchers at the University of Washington have developed an artificial intelligence system that lets a user wearing headphones look at a person speaking for three to five seconds to "enrol" them. They call the system "Target Speech Hearing" (TSH). The system cancels all other sounds in the environment and plays just the enrolled speaker's voice in real-time, even as the listener moves around in noisy places and no longer faces the speaker.

The team's senior author, Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering, presented their findings in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. He remarked that people tend to think of AI now as web-based chatbots that answer questions.

However, in this project, they developed AI to modify the auditory perception of anyone wearing headphones, given their preferences. With their devices, they can now hear a single speaker clearly even if they are in a noisy environment with lots of other people talking.

Using the system

According to the researchers, to use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker's voice should reach the microphones on both sides of the headset simultaneously; there's a 16-degree margin of error.

The headphones send that signal to an onboard embedded computer, where the team's machine-learning software learns the desired speaker's vocal patterns. The system latches onto that speaker's voice and continues to play it back to the listener, even as the pair moves around. The system's ability to focus on the enrolled voice improves as the speaker keeps talking, giving the system more training data.

The result of the test

The team tested its system on 21 subjects, who rated the clarity of the enrolled speaker's voice nearly twice as high as the unfiltered audio on average. The University statement remarked that this study builds on the team's previous "semantic hearing" research, which allowed users to select specific sound classes — such as birds or voices — that they wanted to hear and cancelled other sounds in the environment.

Presently, the TSH system can enrol only one speaker at a time and only when no other loud voice comes from the same direction as the target speakers. Users who aren't happy with the sound quality can run another enrollment on the speaker to improve clarity.

Sources of Article

Washington University

IndiaAI Recommends