Siddharth Tourani∗§1 Jayaram Reddy†2 Sarvesh Thakur†2 K Madhava Krishna†2 Muhammad Haris Khan§3 N Dinesh Reddy‡4
1 Computer Vision and Learning Lab, University of Heidelberg 2 Robotics Research Center, IIIT Hyderabad, India 3 MBZUAI 4 Amazon
With the rise in consumer depth cameras, a wealth of unlabeled RGB-D data has become available. This prompts the question of how to utilize this data for geometric reasoning of scenes. While many RGB-D registration methods rely on geometric and feature-based similarity, we take a different approach. We use cycle-consistent keypoints as salient points to enforce spatial coherence constraints during matching, improving correspondence accuracy. Additionally, we introduce a novel pose block that combines a GRU recurrent unit with transformation synchronization, blending historical and multi-view data. Our approach surpasses previous selfsupervised registration methods on ScanNet and 3DMatch, even outperforming some older supervised methods. We also integrate our components into existing methods, showing their effectiveness.