Transactions on Automation Science and Engineering (T-ASE) 2022
Our recognition model-based method can accurately recognize objects as one or more deformable superquadrics and effectively find grasp poses from the recognized shapes.
Grasping objects with single-part
Grasping objects with multi-parts
Grasping previously unseen objects for the first time, in which only partially occluded views of the object are available, remains a difficult challenge. Despite their recent successes, deep learning-based end-to-end methods remain impractical when training data and resources are limited and multiple grippers are used. Two-step methods that first identify the object shape and structure using deformable shape templates, then plan and execute the grasp, are free from those limitations, but also have difficulty with partially occluded objects. In this paper, we propose a two-step method that merges a richer set of shape primitives, the deformable superquadrics, with a deep learning network, DSQNet, that is trained to identify complete object shapes from partial point cloud data. Grasps are then generated that take into account the kinematic and structural properties of the gripper while exploiting the closed-form equations available for deformable superquadrics. A seven-dof robotic arm equipped with a parallel jaw gripper is used to conduct experiments involving a collection of household objects, achieving average grasp success rates of 93% (compared to 86% for existing methods), with object recognition times that are ten times faster.
Our recognition-based grasping method proceeds in three steps as shown in the below figure: (i) a trained segmentation network is used to segment a partially observed point cloud into a set of simpler point clouds; (ii) The trained DSQNet converts each point cloud into a deformable superquadric primitive, with its collective union representing the full object shape; (iii) grasp poses are generated in a gripper-dependent manner from the recognized full shapes.
Superquadrics are an extended set of quadric surfaces that can be used to represent diverse shapes ranging from boxes, cylinders, and ellipsoids to bi-cones, octahedra, and other complex symmetric shapes, even those with rounded corners and edges. Superquadrics are further categorized into superellipsoids, superhyperboloids, and supertoroids; for our modeling purposes the superellipsoids are sufficient. The corresponding implicit equations for a superquadric surface with size parameters \((a_1, a_2, a_3) \in \mathbb{R}_+^3\) and shape parameters \((e_1, e_2) \in \mathbb{R}_+^2\) are of the form: for \(\textbf{x} = (x, y, z)\), $$ \begin{equation*} f(\textbf{x})=\left(\left|\frac{x}{a_1}\right|^{\frac{2}{e_2}} + \left|\frac{y}{a_2}\right|^{\frac{2}{e_2}}\right)^{\frac{e_2}{e_1}} + \left|\frac{z}{a_3}\right|^{\frac{2}{e_1}} = 1. \end{equation*} $$ Although more expressive than quadrics, superquadrics are still limited by their inability to capture tapered and bent objects. Deformable superquadrics are obtained by applying global tapering and bending deformations as shown below.
We design a neural network architecture, referred to as the Deformable Superquadric Network (DSQNet), which takes partially observed point cloud data as input and outputs the eight parameters and the pose of the deformable superquadric. This output aims to reconstruct the full shape of the object, including its occluded parts. The network is trained to minimize fitting errors between the ground-truth point cloud and the predicted deformable superquadric.
Recognition is achieved quickly and accurately with a simple forward pass through the neural network. Extensive experiments and benchmark comparisons using a variety of everyday objects demonstrate both the strengths of our approach and potential areas for improvement. For recognizing household objects, our method achieves the highest accuracy (in terms of volumetric IoU) and the fastest computation speeds among existing recognition-based methods.
Once the shape of an object is recognized as a set of deformable superquadric primitives, conventional grasp pose generation techniques can be applied for a given gripper. In this paper, we focus on parallel jaw grippers and adopt an antipodal points sampling-based grasp pose generation method. We first develop an efficient algorithm for sampling antipodal points on deformable superquadric primitives, followed by a grasp pose generation algorithm that utilizes the sampled pairs of antipodal points.
We have additionally developed a shape uncertainty-aware grasping algorithm to assist in grasp pose planning and enhance grasp performance in more challenging scenarios. While DSQNet achieves the best recognition performance on both synthetic and real-world objects, perfect shape recognition is not always feasible. If the generated grasp pose targets an erroneous part of the recognized shape, the gripper may fail to grasp the object. To mitigate the risk of grasping incorrect parts, we introduce a new grasp score that accounts for shape uncertainty.
@article{kim2022dsqnet,
title={DSQNet: a deformable model-based supervised learning algorithm for grasping unknown occluded objects},
author={Kim, Seungyeon and Ahn, Taegyun and Lee, Yonghyeon and Kim, Jihwan and Wang, Michael Yu and Park, Frank C},
journal={IEEE Transactions on Automation Science and Engineering},
volume={20},
number={3},
pages={1721--1734},
year={2022},
publisher={IEEE}
}