Conference on Robot Learning (CoRL) 2022, Oral Presentation
Our pushing dynamics model-based approach can find a series of pushing actions that successfully perform the desired tasks (e.g., singulation which is a task to separate objects by more than a certain distance).
Without using ad hoc objective functions, the robot realizes how to re-configure the objects so that feasible grasp poses can be found for the target objects (e.g., a larget and flat object).
For tabletop object manipulation tasks, learning an accurate pushing dynamics model, which predicts the objects’ motions when a robot pushes an object, is very important. In this work, we claim that an ideal pushing dynamics model should have the SE(2)-equivariance property, i.e., if tabletop objects’ poses and pushing action are transformed by some same planar rigid-body transformation, then the resulting motion should also be the result of the same transformation. Existing state-of-the-art data-driven approaches do not have this equivariance property, resulting in less-than-desirable learning performances. In this paper, we propose a new neural network architecture that by construction has the above equivariance property. Through extensive empirical validations, we show that the proposed model shows significantly improved learning performances over the existing methods. Also, we verify that our pushing dynamics model can be used for various downstream pushing manipulation tasks such as the object moving, singulation, and grasping in both simulation and real robot experiments.
Suppose a pushing dynamics model is trained with an experience where a robot pushes a box object into a red arrow direction as shown in the below figure (Scene 1). Consider a new situation where the same box object is located at a different pose and the robot pushes the object in the same relative direction as shown in Scene 2. At an intuitive level, a good model should be able to easily generalize to this type of new situation, where tabletop objects are only translated or rotated along the \(z\)-axis. In more technical terms, the pushing dynamics model needs to be equivariant to the \(\text{SE}(2)\) transformation.
The core idea behind making the model \(\text{SE}(2)\)-equivariant is to properly transform the coordinates of the pushing action and the objects' poses as needed. This design naturally captures the symmetry of physical systems and significantly enhances generalization performance. To apply the proposed equivariant pushing dynamics model in environments with only vision data and unseen objects, we use a recognition module capable of identifying the objects' shapes and poses. In this work, we represent 3D object shapes using a shape class called superquadrics. Accordingly, we refer to our superquadric object representation-based pushing dynamics model as the SuperQuadric Pushing Dynamics Network (SQPDNet).
Our approach can find a series of pushing actions that successfully perform the desired tasks. Notably, for the grasping tasks, without using ad hoc objective functions, the robot realizes how to re-configure the objects so that feasible grasp poses can be found for the target objects: (i) the robot pushes the large and flat object to the edge of the table and (ii) the robot pushes the surrounding objects to make the surrounded target object graspable.
Additional real-world pushing manipulation videos can be found below.
@inproceedings{kim2023se,
title={SE (2)-Equivariant Pushing Dynamics Models for Tabletop Object Manipulations},
author={Kim, Seungyeon and Lim, Byeongdo and Lee, Yonghyeon and Park, Frank C},
booktitle={Conference on Robot Learning},
pages={427--436},
year={2023},
organization={PMLR}
}