REINFORCEMENT LEARNING MODEL FOR FINDING OPTIMAL PATH

ARAN SIRWAN  ISMAIL; ZHIAR AHMED  MOHAMMED; KOZHIR MUSTAFA  HUSSAIN; HIWA OMER  HASSAN

doi:10.26682/csjuod.2023.26.2.45

ARAN SIRWAN ISMAIL University of Human Development- Iraq
ZHIAR AHMED MOHAMMED University of Human Development- Iraq
KOZHIR MUSTAFA HUSSAIN University of Human Development- Iraq
HIWA OMER HASSAN University of Human Development- Iraq

DOI: https://doi.org/10.26682/csjuod.2023.26.2.45

Keywords: Reinforcement Learning, Shortest Path Finding, Q-Learning, Double Q-Learning, Reward Shaping, Decaying Epsilon and Alpha.

Abstract

Reinforcement learning (RL) has become a powerful method for addressing complex optimization challenges, such as determining optimal (shortest) paths across various fields. This study aims to design and examine RL algorithms to identify the shortest path in expansive and dynamic environments. Our research advances RL algorithms for optimal pathfinding, showcasing Q-Learning's scalability, path length, and cumulative rewards superiority, enriching optimization methodologies, and guiding future RL explorations. We utilized a blend of Q-learning and Double Q-learning methods to improve our model's performance in conjunction with the Reward Shaping technique and exploration-exploitation tactics like epsilon-greedy with decaying epsilon. We evaluated our approach on a 27x27 matrix representing the environment. In both 17x17 and 27x27 environments, Q-Learning consistently showed optimal paths with shorter distances, while having a slightly slower performance than Dijkstra's algorithm, with completion times of (0.00093 seconds versus 0.000023 seconds and 0.0097 seconds versus 0.000034 seconds); nevertheless, our assessment encompassed Q-Learning, Dijkstra, and Random-selection algorithms. Q-Learning consistently showcased excellence, while random selection yielded suboptimal routes due to its inherent randomness. Cumulative rewards highlighted Q-Learning's superiority (ranging from 2,197,570.15 to 205,823.20), reflecting its adept learning capacity. Dijkstra prioritized minimal path length (cumulative rewards ranging from 6 to 1), whereas Random-selection exhibited considerable variation (ranging from 132,449 to 11,901).

Downloads

Download data is not yet available.

References

Zhang, H. Chen, S. Song, and F. Hu, “Reinforcement Learning-Based Motion Planning for Automatic Parking System,” IEEE Access, vol. 8, pp. 154485–154501, 2020, doi: 10.1109/ACCESS.2020.3017770.
S. Spanò et al., “An efficient hardware implementation of reinforcement learning: The q-learning algorithm,” IEEE Access, vol. 7, pp. 186340–186351, 2019, doi: 10.1109/ACCESS.2019.2961174.
R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction Second edition,” 2018.
L. Piardi, J. Lima, A. I. Pereira, and P. Costa, “Coverage path planning optimization based on Q-learning algorithm,” in AIP Conference Proceedings, American Institute of Physics Inc., Jul. 2019. doi: 10.1063/1.5114220.
Z. Tan, T. Wang, and Y. Yu, “Mobile robot navigation method based on improved Q-learning algorithm,” in Proceedings - 2021 36th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2021, Institute of Electrical and Electronics Engineers Inc., May 2021, pp. 184–188. doi: 10.1109/YAC53711.2021.9486613.
P. B. Karthik, K. Kumar, V. Fernandes, and K. Arya, “Reinforcement Learning for Altitude Hold and Path Planning in a Quadcopter,” in 2020 6th International Conference on Control, Automation and Robotics, ICCAR 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 463–467. doi: 10.1109/ICCAR49639.2020.9108104.
M. Khalid, N. Aslam, and L. Wang, “A Reinforcement Learning based Path Guidance Scheme for Long-range Autonomous Valet Parking in Smart Cities,” in 2020 8th International Conference on Communications and Networking, ComNet2020 - Proceedings, Institute of Electrical and Electronics Engineers Inc., Oct. 2020. doi: 10.1109/ComNet47917.2020.9306103.
X. Liu, Q. Zhou, H. Ren, and C. Sun, 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). IEEE, 2018.
P. Zhang et al., “Reinforcement learning-based end-to-end parking for automatic parking system,” Sensors (Switzerland), vol. 19, no. 18, Sep. 2019, doi: 10.3390/s19183996.
X. Lin and R. Guo, “Path planning of unmanned surface vehicle based on improved q-learning algorithm,” in 2019 IEEE 3rd International Conference on Electronic Information Technology and Computer Engineering, EITCE 2019, Institute of Electrical and Electronics Engineers Inc., Oct. 2019, pp. 302–306. doi: 10.1109/EITCE47263.2019.9095038.
C. Wang, X. Yang, and H. Li, “Improved Q-Learning Applied to Dynamic Obstacle Avoidance and Path Planning,” IEEE Access, vol. 10, pp. 92879–92888, 2022, doi: 10.1109/ACCESS.2022.3203072.
G. Kulathunga, “A Reinforcement Learning based Path Planning Approach in 3D Environment,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 152–160. doi: 10.1016/j.procs.2022.10.217.
A. Ranjan, A.K. Mahadani, and T.A. Rashid, "Multi-agent Reinforcement Learning for Stock Market Strategy Analysis," in S. Gupta, I. Banerjee, and S. Bhattacharyya (Eds.), *Multi Agent Systems*, Springer Tracts in Human-Centered Computing, Springer, Singapore, 2022, ch. 9, https://doi.org/10.1007/978-981-19-0493-6_9.
X. Wang, L. Jin, and H. Wei, “The Shortest Path Planning Based on Reinforcement Learning,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1742-6596/1584/1/012006.
V. M. Babu, U. V. Krishna, and S. K. Shahensha, "An autonomous path finding robot using Q-learning," in 2016 10th International Conference on Intelligent Systems and Control (ISCO), Jan. 2016, pp. 1-6. doi: 10.1109/ISCO.2016.7727034.
Y. Hu, L. Yang, and Y. Lou, “Path Planning with Q-Learning,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Jun. 2021. doi: 10.1088/1742-6596/1948/1/012038.
Y. Li, H. Wang, J. Fan, and Y. Geng, “A novel Q-learning algorithm based on improved whale optimization algorithm for path planning,” PLoS One, vol. 17, no. 12 December, Dec. 2022, doi: 10.1371/journal.pone.0279438.
M. Dumke, “Double Q(σ) and Q (σ, λ): Unifying Reinforcement Learning Control Algorithms,” Nov. 2017, [Online]. Available: http://arxiv.org/abs/1711.01569
H. Van Hasselt, "Double Q-learning," in Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 2613-2621, 2016. [Online]. Available: https://proceedings.neurips.cc/paper/2016/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf
Jason, M. Siever, A. Valentino, K. M. Suryaningrum, and R. Yunanda, “Dijkstra’s algorithm to find the nearest vaccine location,” Procedia Computer Science, vol. 216, pp. 5–12, 2023, doi: 10.1016/j.procs.2022.12.105
W. Feijen and G. Schäfer, “Dijkstra’s algorithm with predictions to solve the single-source many-targets shortest-path problem,” Dec. 2021, [Online]. Available: http://arxiv.org/abs/2112.11927