Monocular Reconstruction of Vehicles: Combining SLAM with Shape Priors

International Conference on Robotics and Automation (ICRA) 2016, Stockholm, Sweden

Download Video (MP4, 720p, 16.8 MB)


Reasoning about objects in images and videos using 3D representations is re-emerging as a popular paradigm in computer vision. Specifically, in the context of scene understanding for roads, 3D vehicle detection and tracking from monocular videos still needs a lot of attention to enable practical applications.

Current approaches leverage two kinds of information to deal with the vehicle detection and tracking problem: (1) 3D representations (eg. wireframe models or voxel based or CAD models) for diverse vehicle skeletal structures learnt from data, and (2) classifiers trained to detect vehicles or vehicle parts in single images built on top of a basic feature extraction step. In this paper, we propose to extend current approaches in two ways. First, we extend detection to a multiple view setting. We show that leveraging information given by feature or part detectors in multiple images can lead to more accurate detection results than single image detection. Secondly, we show that given multiple images of a vehicle, we can also leverage 3D information from the scene generated using a unique structure from motion algorithm. This helps us localize the vehicle in 3D, and constrain the parameters of optimization for fitting the 3D model to image data. We show results on the KITTI dataset, and demonstrate superior results compared with recent state-of-the-art methods, with upto 14.64 % improvement in localization error.



  title={Monocular Reconstruction of Vehicles: Combining SLAM with Shape Priors},
  author={Chhaya, Falak and Reddy, N Dinesh and Upadhyay, Sarthak and Chari, Visesh and Zia, M Zeeshan and Krishna, K Madhava},
  booktitle={Proceedings of the 2016 International Conference on Robotics and Automation},


Falak Chhaya