Optimus

Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To this end, we present a novel imitation learning system called Optimus that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. Optimus introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to train performant transformer-based policies. We demonstrate that Optimus can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon and pick-and-place tasks, to shelf and articulated object manipulation, achieving 70 to 80% success rates.

Imitating Task and Motion Planning with Visuomotor Transformers

Abstract

Offline Pretrained TAMP Imitation System

Optimus enables visuomotor policies to solve manipulation tasks with up to 8 stages

Optimus can solve tasks requiring obstacle awareness and skills beyond pick-and-place.

Optimus can distill TAMP's task planning and scene generalization capabilities.

Large Scale Evaluation