Pranjal Paul1 Vineeth Bhat1 Tejas Salian1 Mohammad Omama2 Krishna Murthy Jatavallabhula3 Naveen Arulselvan4 K. Madhava Krishna1
1 Robotics Research Center, IIIT Hyderabad, India 2 The University of Texas at Austin, USA 3 Meta AI Research 4 Ati Motors, India
Global localization is a critical capability for autonomous navigation, yet existing dense-LiDAR approaches are storage-heavy and scale poorly. **SparseLoc** introduces a compact and generalizable localization framework by leveraging open-vocabulary vision-language models to build semantic-topometric landmark maps. Unlike traditional methods, SparseLoc creates sparse landmark representations with semantic associations that can be robustly matched during inference. A Monte Carlo localization pipeline uses these sparse maps and camera observations, and is enhanced by a late-stage gradient-based optimization module for fine-grained correction. Using just 1⁄500 of the points of dense maps, the system achieves sub-5 meter position and 2° heading error on KITTI. It also demonstrates strong cross-dataset generalization and robust recovery from kidnapped robot scenarios. SparseLoc pushes the boundary of memory-efficient, semantically aware, vision-language-driven localization.