Vansh Garg†1 Rohit Jayanti*1 Krish Pandya*1 Sarthak Chittawar*1 Siddharth Tourani2 Muhammad Haris Khan3 Sourav Garg‡1 K. Madhava Krishna‡1
1 Robotics Research Center, IIIT Hyderabad, India 2 University of Heidelberg, Germany 3 MBZUAI, UAE
Visual navigation depends critically on how the environment is represented. Traditional 3D mapping approaches enforce global geometric consistency, while image- or object-centric topological representations often sacrifice geometric fidelity, limiting navigation performance. In this work, we introduce a novel pixel-relative connectivity representation that preserves local geometric accuracy without requiring global consistency. Building on recent advances in 3D-grounded image matching, we construct a pixel-level topological graph using inter-image correspondences in relative 3D coordinate frames. We perform global path planning over this graph and derive a dense WayPixel costmap that captures fine-grained geometric gradients toward the goal. A learned controller, conditioned on this costmap, predicts trajectory rollouts, enabling robust and precise navigation. We evaluate our approach across diverse simulated and real-world scenarios, demonstrating consistent improvements in accuracy, robustness, and generalization over existing image- and object-level navigation methods.