British researchers have created a new robotic mouth system that uses machine learning and a novel mechanism modeled after human cheek muscles to better synchronize speech and mouth movements in realistic, human-like robots, bringing us closer to the day when people can comfortably socialize with those automatons.
"It is important to replicate the human mouth for speech, as it aids in speech reading," Carl Strathearn, a researcher at Edinburgh Napier University and lead author of the study, told The Academic Times. "When humans talk to each other, our gaze moves back and forth between the eyes and mouth to help us gauge attention and understanding. You can see the consequence for poor lip-sync in some CGI animations."
"We are so in tune with the human face and speech that even the slightest mistakes are noticeable, unnatural, and distracting," he added.
One of the key breakthroughs in the study, published Feb. 27 in the Journal of Intelligent & Robotic Systems, was a new actuator that emulated the human cheek's buccinator muscle. The buccinator actuator could stretch and purse the robot's lips, making it look more realistic when the robot produced rounded "O" and "U" word sounds — such as the vowel sounds in "dew," "too," and "few."
"The mechanism itself is not complicated, but the result is visually effective, which is significant when evaluating the system against other robots and humans during speech," said Strathearn. "I knew from doing my preliminary research that this muscle group was frequently overlooked in humanoid robotics, and when I built the prototype I started to understand why. The space between the teeth and gums, or prosthetic teeth and silicone skin, is very limited. Therefore, the mechanism had to be robust enough to manage the system load whilst accurately replicating the movements of the buccinator muscles within a tiny space inside the mouth."
The researchers overcame this challenge with an aluminum composite frame, which stretched the lips without being visible on the silicone skin's surface. In addition to designing the robotic mouth, the team developed a machine learning application that synchronized jaw movements with input from a speech synthesis application.
Strathearn thinks the accurate synchronization of mouth movements with speech could be key to helping robots avoid the "uncanny valley" — a level of authenticity that approaches but does not quite match a human-like appearance, triggering disgust or anxiety among viewers. Noting that many human-robot interaction specialists have studied virtual avatars of robots rather than the real thing, he pointed out that the uncanny valley effect could be very different in actual robots than in videos or CGI.
"The problem with the [uncanny valley] hypothesis is that it does not offer any solutions, and because it is open-ended, it can be interpreted in so many different ways that it has become diluted beyond any practical value," Strathearn added. He has previously tackled the realism of humanoid robots through his "Multimodal Turing Test," a robot-focused update of Alan Turing's famous test for whether a machine could convincingly imitate a human.
Of course, human-robot interaction is a two-way street — just as humans shape robots, robots and other technologies shape humans. Scientists recently discovered, for instance, that robots might encourage people to take more risks. In past research, Strathearn has found that people prefer to interact with robots that resemble themselves, with younger people favoring a younger-looking robot and older people favoring a robotic elder.
Strathearn noted that humans may already be pushing into the uncanny valley through facial reconstruction surgery. Online communities such as Reddit's r/uncannyvalley chronicle some of the ways people are already pushing beyond the range of natural facial structure. "In some ways, this may be perceived as a glimpse into the higher realms of the uncanny valley as it — sometimes subtly, sometimes not so subtly — merges the artificial with the natural," Strathearn said.
The current study comes as companies and governments start to roll out humanoid robots, prompted in many cases by the widespread social isolation resulting from the COVID-19 pandemic. Hanson Robotics recently announced that it will begin mass-producing Sophia and other humanoid robots by the close of 2021. The humanoid robot Lexi has been introduced to schools in Switzerland. And MSC Cruises has debuted "Rob," a robotic bartender who will serve drinks aboard the MSC Virtuosa.
But Strathearn is skeptical. "Personally, I do not think realistic humanoid robots are ready and useful for everyday tasks at the moment," he said. "We need to remember that not everybody can interact with technology effectively."
The researchers installed their robotic mouth system in Strathearn's Euclid model. To see if their system bested the competition, they pitted Euclid against 10 of the top realistic humanoid robots, including Hanson Robotics' Sophia and Bina. While they could not access all 10 robots physically, they were able to compare videos of the robots speaking. The scientists recruited 50 participants, who evaluated the accuracy and aesthetics of robot speech in the different systems.
"I conducted my online survey anonymously across numerous social media platforms and forums," Strathearn explained. "I really felt that the best people to evaluate my robotic mouth system was the non-academic/scientific audience. After all, I was asking people to judge how real a humanoid robot is — and what more do you need to be qualified for that than to be human?"
Euclid placed first in lip synchronization accuracy, second in the time difference between spoken words and mouth movements and third in synchronization between jaw movement and syllable patterning, receiving lower rankings for visual authenticity and realistic speech. Strathearn believes Euclid would have performed even better were it not for the technological shortcomings of the microprocessors he used. "I used off-the-shelf microcontrollers and had to work up to their limitations," he explained. He is continuing to improve Euclid, which is now on its third generation.
Strathearn traces his own passion for human-like robotics to pop culture, saying he was inspired by movies such as "The Terminator," "Star Wars" and "Blade Runner" as a child. "The real drive for me is bringing something inanimate to life," he said.
The study, "A Novel Speech to Mouth Articulation System for Realistic Humanoid Robots," published in the Journal of Intelligent & Robotic Systems on Feb. 27, was authored by Carl Strathearn, Edinburgh Napier University, and Eunice Minhua Ma, Falmouth University.