Manit Chansuparp. A comprehensive improvement of deep reinforcement learning for autonomous UAV navigation using the novel reward function and actor-critic model enhancer methods. Doctoral Degree(Computer Science). King Mongkut's Institute of Technology Ladkrabang. KMITL Lifelong Learning Center. : King Mongkut's Institute of Technology Ladkrabang, 2022.
A comprehensive improvement of deep reinforcement learning for autonomous UAV navigation using the novel reward function and actor-critic model enhancer methods
Abstract:
The autonomous navigation has gained many attentions in recent years due to many factors such as exponential growth of logistic industry, the need for social distancing in contagious pandemic. Additionally, one thing that also gains attentions parallelly with it is the Unmanned Aerial Vehicle (UAV) or also known as drone. The usual UAV are small vehicles which dont need any pilot and can fly to the destination fast since traffic jam on ground can be avoided for them. There are many recent works proposing the autonomous UAV navigation method. The most of them chose deep reinforcement learning as the learning model and gain satisfactory results. However, those works are still far from real-world adoption since they only test the methods on static and unrealistic environments and low dimensional action space. Contrarily, in real world, the environment is dynamic and also the UAV can move freely in 6 Degrees of Freedom (6DOF). Plus, after the test of latest method on our complex environments, we also found that some problems are caused by irrationality of the traditional reward function and the apprehensive behavior of agent (UAV) which is the being that the agent will move back and forth repeatedly when it faces against risky scenes. Hence, the aim of this work is to propose the method which can be conducted on more realistic environment while still retain the high success rate because, in real- world adoption, the collision can get UAV the severe damage. The proposed method consists of parts which are used to solve or alleviate the different problems in the past. The main parts are as followings: First is the point cloud simplification with truncated icosahedron structure which make enormous cloud points handleable even for microprocessor. Second is the Augmentative backward reward function (ABR+). This function has more rational reward dispensation mechanism to help wipe out the agents bias against the goal point. Third, Life Guard, it is an enhancer for the actor-critic model that urges the learning process to recognize the limitation of risk and help reduces the apprehensive behavior. The experimental results shown that the proposed method can solve and alleviate the problems in the past. The success rates were 10% higher than that of the current state-of-the-art method, FORK, in static environment and 2.4% for dynamic
Address:
BANGKOK
Email:
Lifelong@kmitl.ac.th
King Mongkut's Institute of Technology Ladkrabang. KMITL Lifelong Learning Center