PlanT: Explainable Planning Transformers
via Object-Level Representations

Katrin Renz1,2 Kashyap Chitta1,2 Otniel-Bogdan Mercea1
A. Sophia Koepke1 Zeynep Akata1,2,3 Andreas Geiger1,2
1 University of Tübingen 2 Max Planck Institute for Intelligent Systems, Tübingen
3 Max Planck Institute for Informatics, Saarbrücken
CoRL 2022


TL;DR: We propose PlanT, a state-of-the-art planner for self-driving based on object-level representations and a transformer architecture which can explain its decisions by identifying the most relevant object. Adding an off-the-shelf perception module pushes the state of the art on the Longest6 Benchmark by 10 points.
Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a novel approach for planning in the context of self-driving that uses a standard transformer architecture. PlanT is based on imitation learning with a compact object-level input representation. On the Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the driving score of the expert) while being 5.3× faster than equivalent pixel-based planning baselines during inference. Combining PlanT with an off-the-shelf perception module provides a sensor-based driving system that is more than 10 points better in terms of driving score than the existing state of the art. Furthermore, we propose an evaluation protocol to quantify the ability of planners to identify relevant objects, providing insights regarding their decision-making. Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.


This work was supported by the BMWi (KI Delta Learning, project number: 19A19013O), the BMBF (Tübingen AI Center, FKZ: 01IS18039A), the DFG (SFB 1233, TP 17, project number: 276693517), by the ERC (853489 - DEXIM), and by EXC (number 2064/1 – project number 390727645). We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting K. Renz, K. Chitta and O.-B. Mercea. The authors also thank Niklas Hanselmann and Markus Flicke for proofreading and Bernhard Jaeger for helpful discussions.

The template for this website was borrowed and adapted from Despoina Paschalidou.