Building Voice AI That Works for Everyone

Nihar Mahapatra, Ph.D.
Associate Professor, Department of Electrical and Computer Engineering, College of Engineering
Ann Marie Ryan, Ph.D.
Professor, Department of Psychology, College of Social Science
J. Scott Yaruss, Ph.D., CCC-SLP, BCS-F, F-ASHA
Professor, Department of Communicative Sciences and Disorders, College of Communication Arts and Sciences

Project Overview

Funded by the National Science Foundation, a team of researchers plans to design a voice-recognition app that can recognize and correctly transcribe stuttered speech
The app would provide an alternative to current voice-activation systems, which generally do not work for the 3 million people in the U.S. and 80 million people worldwide who stutter

Products/Outcomes

A standalone app that can understand stuttered speech
An app that can preprocess stuttered speech so that it can be understood by existing voice AI, such as Siri and Alexa
A postprocessing app that would correctly transcribe stuttered speech
Accessibility guidelines to ensure that future voice AI technology works for people who stutter and have other speech differences

Partners (partial list)

AImpower.org
American Institute for Stuttering
Apple
Friends: The National Association of Young People Who Stutter
HireVue
International Stuttering Association
Michigan Speech, Language, Hearing Association
National Stuttering Association
SAY: The Stuttering Association for the Young
Society for Human Resource Management
Spartan Innovations - MSU Innovation Center
STAMMA: The British Stammering Association
StammerTalk: Chinese Stuttering Self-Help Network
SPACE - Stuttering, People, Arts, Community, Education
withVR
World Stuttering Network
50 Million Voices

Form(s) of Engagement

Community-Engaged Research
Community-Engaged Commercialized Activities

The HeardAI team includes scholars, students, and community members representing diverse backgrounds, including (from left): Nihar Mahapatra, Jia Bin, Caryn Herring, Anne Marie Ryan, Megan Arney (project manager), J. Scott Yaruss, and Hope Gerlach-Houck.

For people who stutter, Siri, Alexa, and other voice-activated artificial intelligence (AI) technologies can feel like foes rather than friends.

These technologies—hailed by most for their efficiency and convenience—fail on all counts for people who stutter like Jia Bin, an MSU doctoral student from China who leads the East Lansing chapter of the National Stuttering Association and belongs to the World Stuttering Network.

“It’s daily,” said Bin of her frustrations using automated voice systems, which typically cut her off, don’t start in the first place, or misunderstand her. “Technology is supposed to help people communicate better, but the current level of voice AI makes people who stutter feel discriminated against by the technology itself.”

And the stakes are higher than mere inconvenience. Voice-activated technologies have saturated other aspects of everyday life—calling a doctor, using car safety features, even applying for jobs—further excluding those with speech differences.

A multidisciplinary team of MSU researchers has set out to change this by developing an app that will make voice-activated artificial intelligence accessible to people who stutter. The team includes experts representing engineering, communication differences and disorders, psychology, and people who stutter.

“We have the potential to make a difference for people who are getting left behind by the advances we have seen around us,” said principal investigator (PI) Nihar Mahapatra, an associate professor of electrical and computer engineering at MSU.

Caryn Herring speaks at a convention of Friends: The National Association of Young People Who Stutter.

Dream Team

The stuttering project is funded by a National Science Foundation (NSF) Convergence Accelerator program, which supports interdisciplinary, user-driven research projects that address major challenges facing society—and do so quickly.

The grants are organized into tracks. The stuttering project focuses on enhancing quality of life and employment access for people with disabilities.

The topic piqued the interest of Mahapatra, who had heard a report about stuttering several years ago on public radio and considered the possibility of developing AI technology that would meet the needs of people who stutter.

When he learned about the NSF grant, “I thought, ‘This would be a wonderful time to explore this.’” As an engineer, he is well-versed in AI technology.

To round out the research team, he reached out to:

Caryn Herring, a person who stutters who is executive director of Friends: The National Association of Young People who Stutter; she is also a speech-language pathologist and a doctoral candidate in communicative sciences and disorders at MSU.
Ann Marie Ryan, a professor of organizational psychology;
J. Scott Yaruss, a professor of communicative sciences and disorders who directs the Spartan Stuttering Laboratory; and
Hope Gerlach-Houck, an assistant professor of speech, language, and hearing sciences at Western Michigan University in Kalamazoo.

In 2022, the team applied for and received a $749,996 one-year Phase 1 grant for their project, titled “Convergent, Human-Centered Design for Making Voice-Activated AI Accessible and Fair to People Who Stutter.” They recently learned that they have received an additional $5 million Phase 2 grant to continue their work over the next three years.

A key aspect of the NSF grant is that it is user-driven. “That’s the really cool part,” said Herring, who as a co-PI has an opportunity to bring the voices of the stuttering community to the project. “Individuals who stutter and the larger stuttering community have a say. They have a role in what we are going to develop. We’ve gotten input from people who stutter from the start.”

Digital Discrimination

During Phase 1, the team interviewed people who stutter to learn about the “pain points” they experience with automatic speech recognition (ASR) systems. “We need to know where the breakdowns are,” said Yaruss, whose research and clinical work have focused on diminishing the adverse impact that people experience due to stuttering.

For many people who stutter, just getting voice assistants like Google or Siri to activate can be a challenge. If the speaker blocks, that is, if they have a moment when they get completely stuck and no sound comes out, voice-activated systems either cut them off, interrupt them, or misunderstand what they have said.

'HeardAI is a game-changer for those who stutter and those with other communication variations,' said principal investigator Nihar Mahapatra in a video describing the NSF-funded project.

Credit: Matt Talarico / Impact Media Lab

That, in turn, “adds pressure to the speaker because they’re already accustomed to getting cut off by other people and mistreated in that way,” said Yaruss. “It adds a digital layer to the discrimination they already experience.”

For example, Herring doesn’t even attempt to use voice assistants because they simply don’t work. But sometimes, she has no choice. She dreads phone trees, knowing that she has to carve out at least an hour to reach a person. “Those systems don’t understand me or give me enough time to say what I want to say,” said Herring, who often gets disconnected and must start all over again—multiple times.

Increasingly, employers use voice AI to screen job candidates. That puts people who stutter, who already face workplace discrimination and lower pay, at an even greater disadvantage.

A case in point is Herring’s experience with Google’s Interview Warmup, a free online app that lets users practice answering interview questions. Herring answered one prompt by saying that she can quickly assess situations. However, the transcript erroneously recorded her as using the word “assassin” to describe herself.

“If this is what goes to HR person, then Caryn isn’t getting the job,” said Yaruss. “This is cutting edge right now.”

Ann Marie Ryan, an organizational psychologist who specializes in fair hiring practices, has seen an increase in automated video interviewing to screen candidates. Joining the team raised her personal awareness of the obstacles faced by people who stutter, and she has met with HR managers to raise their awareness as well.

While people who stutter can request accommodations under the American Disabilities Act, many don’t realize their responses will be recorded until it’s too late. “I think the bigger issue is a societal design issue: We should be designing hiring tools that have the most possible access for the most people,” Ryan said.

Redesigning the Future of Voice AI

The same can be said for the automatic speech recognition applications, which are built from vast repositories of speech samples. “The problem is that the people who are speaking are not representative of lots of different populations or subgroups in the U.S. and the world,” said Mahapatra. “The models are skewed toward recognizing mainstream American speech.”

Voice AI systems are not designed to handle stuttering, Yaruss said. These systems analyze audio, matching sounds to probable words and then linking them to determine probable sentences. “What happens with stuttering is those word units are disrupted,” he said. For example, if a person stutters on “when” and says “wh-wh-wh,” current models don’t have a word match for that.

The solution lies in collecting large quantities of speech samples from people who stutter, along with correct transcriptions—a gap the MSU team is well positioned to fill. “We’re stuttering folks,” said Yaruss. “We have connections to thousands of people who stutter.”

He plans to draw on the knowledge gained from a separate National Institutes of Health grant focused on real-world experiences people who stutter have on a day-to-day basis. In that study, people who stutter record their interactions throughout the day. “This ties in directly with the NSF project because our speech samples are going to be hugely valuable for AI training,” Yaruss said.

Mahapatra has developed a prototype of an app designed to recognize stuttered speech called HeardAI. The app is still under development, but the team has used it to collect speech samples and get input from people who stutter. “Our team has connections with so many people who stutter,” said Mahapatra. “As we develop the app, we will constantly engage them to make sure the technology addresses their needs.”

The team presented the prototype last summer and fall at the annual Friends and American Speech-Language Hearing Association conferences. They sponsored a virtual symposium with representatives of Apple, Google, Microsoft, and Meta, along with other stakeholders. They have also developed a promotional video featuring Herring, Bin, and members of the team.

Once fully developed, HeardAI will include flexible solutions for individuals interacting with voice AI systems like Siri, developers creating accessible voice AI services, and organizations relying on accurate ASR transcripts.

For Phase 2 of the project, the team identified three primary deliverables, Mahapatra said. The first is the HeardAI application itself. Second, the team plans to create a “test bed” that other developers can use to evaluate how any given automatic speech recognition software works with stuttered speech. Third, the team will create accessibility standards to guide the development of voice AI technology going forward.

AI That Understands Everyone

A prototype of HeardAI has been developed to record speech samples and get input from people who stutter. Once fully developed, it will integrate with existing apps such as Siri/Google and others.

Credit: Matt Talarico / Impact Media Lab

With voice AI technology expected to quintuple by 2030, there is an urgency about the team’s work. An estimated 1 percent of the population stutters—which amounts to more than 3 million people in the U.S. and 80 million people worldwide.

The millions who stutter “are getting left behind as the rest of us find this great convenience of being able to talk to our cars and our houses ... and they can’t,” said Yaruss. “They already experience gaps, and these gaps are going to be exacerbated as the technology becomes ever more ubiquitous.”

“It may not seem like a big deal, but it’s one more exclusion, it’s one more discrimination, it’s one more inequity,” he said. “And it’s something that we now have the technology to be able to address.”

Beyond the technology, the NSF’s backing sends a message to the stuttering community that their voices matter, Herring said. “I think people are very excited,” she said. “They can see the real-life impact that HeardAI can have on their life.”

Written by Patricia Mish, University Outreach and Engagement
Photographs courtesy of Caryn Herring unless otherwise noted

Skip Navigation

January 2024 | Volume 16, Issue 1