INFER: INtermediate representations for FuturE pRediciton

Download Video


In urban driving scenarios, forecasting future trajectories of surrounding vehicles is of paramount importance. While several approaches for the problem have been proposed, the best-performing ones tend to require extremely detailed input representations (eg. image sequences). As a result, such methods do not generalize to datasets they have not been trained on. In this paper, we propose intermediate representations that are particularly suited for future prediction. As opposed to using texture (color) information from images, we condition on semantics and train an autoregressive model to accurately predict future trajectories of traffic participants (vehicles) (see fig. above). We demonstrate that semantics provide a significant boost over techniques that operate over raw pixel intensities/disparities. Uncharacteristic of state-of-the- art approaches, our representations and models generalize to completely different datasets, collected across several cities, and also across countries where people drive on opposite sides of the road (left-handed vs right-handed driving). Additionally, we demonstrate an application of our approach in multi- object tracking (data association). To foster further research in transferrable representations and ensure reproducibility, we release all our code and data.


    Main Paper