Mohammad Nomaan Qureshi*1 Pushkal Katara*1 Abhinav Gupta*1 Harit Pandya2 Harish Y V S1 AadilMehdi Sanchawala1 Gourav Kumar3 Brojeshwar Bhowmick3 K. Madhava Krishna1
1 Robotics Research Center, KCIS, IIIT Hyderabad 2 Cambridge Research Laboratory, Toshiba Europe, Cambridge, UK 3 Tata Consultancy Services
Recent data-driven approaches to visual servoing have shown improved performances over classical methods due to precise feature matching and depth estimation. Some recent servoing approaches use a model predictive control (MPC) framework which generalise well to novel environments and are capable of incorporating dynamic constraints, but are computationally intractable in real-time, making it difficult to deploy in real-world scenarios. On the contrary, single-step methods optimise greedily and achieve high servoing rates, but lack the benefits of the MPC multi-step ahead formulation. In this paper, we make the best of both worlds and propose a lightweight visual servoing MPC framework which generates optimal control near real-time at a frequency of 10.52 Hz. This work utilises the differential cross-entropy sampling method for quick and effective control generation along with a lightweight neural network, significantly improving the servoing frequency. We also propose a flow depth normalisation layer which ameliorates the issue of inferior predictions of two view depth from the flow network. We conduct extensive experimentation on the Habitat simulator and show a notable decrease in servoing time in comparison with other approaches that optimise over a time horizon. We achieve the right balance between time and performance for visual servoing in six degrees of freedom (6DoF), while retaining the advantageous MPC formulation. Our code and dataset are publicly available.