Meha Kaushik1 K. Madhava Krishna1
The core of Reinforcement learning lies in learning from experiences. The performance of the agent is hugely impacted by the training conditions, reward functions and exploration policies. Deep Deterministic Policy Gradient(DDPG) is a well known approach to solve continuous control problems in RL. We use DDPG with intelligent choice of reward function and exploration policy to learn various driving behaviors(Lanekeeping, Overtaking, Blocking, Defensive, Opportunistic) for a simulated car in unstructured environments. In cluttered scenes, where the opponent agents are not following any driving pattern, it is difficult to anticipate their behavior and henceforth decide our agent’s actions. DDPG enables us to propose a solution which requires only the sensor information at current time step to predict the action to be taken. Our main contribution is generating a behavior based motion model for simulated cars, which plans for every instant.