Deep learning model advances how robots can independently grasp objects

March 19, 2021
Robot grip is getting closer and closer to humanlike. (Pixabay/Stefan Schulz)

Robot grip is getting closer and closer to humanlike. (Pixabay/Stefan Schulz)

Robots are unable to perform everyday manipulation tasks, such as grasping or rearranging objects, with the same dexterity as humans. But Brazilian scientists have moved this research a step further by developing a new system that uses deep learning algorithms to improve a robot's ability to independently detect how to grasp an object, known as autonomous robotic grasp detection.

In a paper published Feb. 24 in Robotics and Autonomous Systems, a team of engineers from the University of São Paulo addressed existing problems with the visual perception phase that occurs when a robot grasps an object. They created a model using deep learning neural networks that decreased the time a robot needs to process visual data, perceive an object's location and successfully grasp it.

Deep learning is a subset of machine learning, in which computer algorithms are trained how to learn with data and to improve automatically through experience. Inspired by the structure and function of the human brain, deep learning uses a multilayered structure of algorithms called neural networks, operating much like the human brain in identifying patterns and classifying different types of information. Deep learning models are often based on convolutional neural networks, which specialize in analyzing visual imagery. 

Eduardo Godinho Ribeiro, a Ph.D. student at the University of São Paulo and lead author of the paper, explained to The Academic Times that to develop the proposed robotic manipulation system, the researchers explored grasp detection and visual servoing, a technique that uses visual feedback extracted from a vision sensor to control a robot's motions, separately. 

In order for a robot to grasp an object, the robot must track the object's features to perceive its location, its pose and the points at which the robot's grippers will make contact with the object. In visual servoing, the tracking typically requires three-dimensional models or camera parameters of the object, and processing this information is time-consuming and causes delays in robotic grasping motions.

"A grasping system that can combine the benefits of automatic grasp detection and that receives visual feedback to deal with unknown dynamic objects, in real-time, is one of the goals in robotics," Ribeiro and his co-authors, Raul de Queiroz Mendes and Valdir Grassi Jr., said in the paper. "This work aims to advance towards this goal by designing a real-time, real-world, and reactive autonomous robotic manipulation system."

To improve robotic grasp detection, the team developed a fast, lightweight convolutional neural network. It allows a robot to better predict an object's grasp rectangle, which "symbolizes the position, orientation and opening of the robot's parallel grippers the instant before its closing," according to the paper.

The novel convolutional neural network is able to consider these three grasp features simultaneously. The researchers trained it with the Cornell Grasping Dataset, which consists of 885 images and associated point clouds of 240 common household objects that could potentially be grasped by a robotic arm. Training a neural network means adapting its neurons according to world data, Ribeiro said; the network acquires knowledge so that it can make future decisions.

A second convolutional neural network was designed and trained to perform a visual servo control that ensures the target object remains in the robot's field of view. This addresses situations in which an object may move around in the environment, making it harder for a robot to track its features. Data in visual servoing is collected through a camera mounted directly on the robot.

"The idea is that, based on the current image captured by the camera and the desired image … the controller is capable of generating control signals that make the robot's end-effector go to the desired position," Ribeiro said, referring to the robot's grasping mechanism.

The authors reported that the controller they developed achieved a "millimeter accuracy" for grasping a target object when seeing it for the first time and less than one degree of orientation errors, which Ribeiro called "state-of-the-art results." Using an image of the object in its desired pose, the network was able to predict velocity signals to be applied to the camera that kept the object in the robot's sight line during the grasping motion.

Successfully teaching the networks to work with two-dimensional images of a target object, rather than three-dimensional models, was an important finding. Ribeiro explained that in real-life scenarios, it's difficult to obtain an accurate and complete 3D model of an object seen for the first time, which lengthens processing time for the robot. But after training their networks, no prior knowledge of the objects was required.

"The implemented neural networks can make accurate predictions for objects not seen during training. Moreover, there is no need for 3D models, surface properties of target objects, camera calibration or feature tracking—the algorithm receives only images as inputs," Ribeiro said.

Together, the neural networks form a new system that improves autonomous robotic grasp detection. The designed algorithms can be executed in real-time on less powerful graphics processing units, and because the system is reactive and uses visual feedback, it's capable of overcoming perceptual noise and equation errors, according to Ribeiro.

"To the best of our knowledge, we have not found in the literature other works that achieve such precision with a controller learned from scratch," the authors said. "Thus, this work presents a new system for autonomous robotic manipulation, with the ability to generalize to different objects and with high processing speed, which allows its application in real robotic systems."

The system can be used in situations where a robot has to pick up unknown objects. This could include assistive robotics, such as a wheelchair-mounted manipulator robot, or domestic robots that perform everyday tasks through reaching and grasping. The authors also noted that the visual servo controller that they developed may be used for different manipulation tasks besides grasping. To further this research, Ribeiro said the team will work on new algorithms for grasp detection that consider other types of robotic grippers.

The study, "Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation" published Feb. 24 in Robotics and Autonomous Systems, was authored by Eduardo Godinho Ribeiro, Raul de Queiroz Mendes, and Valdir Grassi Jr, the University of São Paulo.

We use cookies to improve your experience on our site and to show you relevant advertising.