Children struggling with speech development currently rely on in-person evaluation sessions with speech-language pathologists to track progress and identify impairments, but U.S. researchers have invented a more efficient, automated method of monitoring speech through digital platforms based on a new machine-learning algorithm.
Using recorded speech samples from children, the new model can detect phonemes, which are the units of sound that make up a language and distinguish words from one another. It compares each speech sample to a normative database for pronunciation and articulation accuracy. Children whose speech does not meet certain thresholds could then be referred for further assessment by a professional, standardizing the evaluation process.
With the current practice of speech-language pathologists conducting person-to-person sessions to hear children speak, track their progress and diagnose any disabilities, there is bound to be variability in outcomes, the inventors of the model told The Academic Times. Visar Berisha, an associate professor at Arizona State University, and Katherine Hustad, a professor at the University of Wisconsin-Madison, along with three other co-inventors, applied to patent their invention with the World Intellectual Property Organization, which published the application April 1.
"There's a lot of variability in the ways that children produce their speech sounds, and so it's been a very messy problem that most people who do speech recognition have not been particularly interested in," Hustad explained.
Berisha and Hustad said their model should have clear clinical implications. For example, when children visit the doctor, their height and weight are measured against uniform charts to track growth. The inventors' proposed system could do the same for speech development, creating precise norms for different age ranges. They explained that their algorithm is able to compare the speech to a standardized database, provide a continuous measure of articulation precision and characterize all the different ways that words and sounds can be spoken.
"One of the key takeaways from the patent [application] is that the technology behind it will really enable us to understand what normal speech-sound development is, using the latest technological advances for typically developing children," Hustad said. "This is especially relevant for children with developmental disabilities who might not yet have those disabilities identified."
"This tool would help flag those kids as not meeting milestones the way that they should be so we can get them to therapy sooner, and [they can] ultimately have a better quality of life and better long-term educational and vocational outcomes," she continued.
To integrate this into clinical settings, an app based on a mobile platform or a computer can house the model and screen for speech disorders. "We have versions of the mobile application that we have used for other clinical applications, [like] assessing speech in neurological disorders in adults. These are already being used by tens of thousands of patients in the context of clinical studies," Berisha said, explaining that the current project "aims to extend this technology and to make it available for children."
"The idea is that you get a standardized score for all of your speech sounds that we compare against typical children. So you would know exactly how close to typical your child is," Hustad said.
The authors noted that disseminating the model using a mobile platform would help extend the reach of clinicians in emerging areas like telemedicine. It could even potentially let parents conduct a simplified version of the assessment from home.
The patent application includes the model the inventors developed using their algorithm to measure the precision of speech articulation in children. But their model is also able to track the rate and rhythm of the speech, such as how quickly the children are speaking and whether the stress patterns in their speech are appropriate depending on what they're saying. And the model has the capacity to develop standardized expectations for all these parameters based on the speaker's age. "Having an objective assessment of something that plays such an important role in quality of life, namely speech, I think is really critical," Berisha said.
Existing speech-recognition technology is widely used in commercial platforms, including artificial intelligence assistants such as Amazon's Alexa. Berisha explained that such products are primarily designed for adults rather than children, and their aim is to allow for better interaction between the computer and the person. The main concern is ensuring that Alexa understands the person speaking, and so the underlying technologies do not deeply assess different aspects of speech production.
"What we're aiming to address with [our] piece of technology is a really detailed interpretable characterization of what went wrong in the pronunciation of a particular utterance," Berisha said. "So it's really not [about] whether or not the computer understands you, but what do the misarticulation patterns look like phoneme by phoneme."
The researchers plan to further develop and build out their technology by validating their model through a larger study. They hope to provide sufficient evidence that their invention is clinically useful and helps inform downstream clinical decision-making, enabling it to be used to inform interventions and improve quality of life for kids.
"Clinical applications of speech analysis are just starting to take off," Berisha said. "In some sense, the reason why this hasn't been done before is that it's hard, and [that] clinical applications of speech technology is a fairly nascent field. We're kind of at the ground level there."
The application for the patent "Tracking articulatory and prosodic development in children" was filed on Sept. 28, 2020, with the World Intellectual Property Organization. It was published on April 1, 2021 with the application number WO2021/062370. The earliest priority date was Sept. 27, 2019. The inventors of the pending patent are Visar Berisha, Julie Liss, Katherine Hustad, Tristan Mahr, and Kan Kawabata. The applicants are the Arizona Board of Regents on behalf of Arizona State University, the Wisconsin Alumni Research Foundation and Aural Analytics Inc.
Parola Analytics provided technical research for this story.