A. H. Abdul Hafez1 Manpreet Singh1 K Madhava Krishna1 C.V. Jawahar1
Visual localization in crowded dynamic environments requires information about static and dynamic objects. This paper presents a robust method that learns useful features from multiple runs in highly crowded environments. Useful features are identified as distinctive ones that are also reliable to extract in diverse imaging conditions. Relative importance of features is used to derive the weight of each feature. The popular bag-of-words model is used for image retrieval and localization, where query image is the current view of the environment and database contains the visual experience from previous runs. Based on the reliability, features are augmented and eliminated over runs. This reduce the size of representation, and make it more reliable in crowded scenes. We tested the proposed method on data sets collected from highly crowded in Indian urban outdoor settings. Experiments have shown that with the help of a small subset (10%) of the detected features, we can reliably localize. We achieve superior results in terms of an localization error even when more than 90% of the pixels are occluded or dynamic.