Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

doi:10.1088/1674-1056/ad95f1

中国物理B ›› 2025, Vol. 34 ›› Issue (1): 18705-018705.doi: 10.1088/1674-1056/ad95f1

所属专题： SPECIAL TOPIC — Computational programs in complex systems

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

Li Hong(洪莉)¹, Yu Liu(刘宇)¹, Mengqiao Xu(徐梦俏)^2,†, and Wenhui Deng(邓文慧)²

1 School of Software Technology, Dalian University of Technology, Dalian 116620, China;
2 School of Economics and Management, Dalian University of Technology, Dalian 116024, China

收稿日期:2024-07-27 修回日期:2024-11-13 接受日期:2024-11-22 出版日期:2025-01-15 发布日期:2024-12-12
通讯作者: Mengqiao Xu E-mail:stephanie1996@sina.com
基金资助:
Project supported by the National Natural Science Foundation of China (Grant Nos. 72101046 and 61672128).

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

Li Hong(洪莉)¹, Yu Liu(刘宇)¹, Mengqiao Xu(徐梦俏)^2,†, and Wenhui Deng(邓文慧)²

1 School of Software Technology, Dalian University of Technology, Dalian 116620, China;
2 School of Economics and Management, Dalian University of Technology, Dalian 116024, China

Received:2024-07-27 Revised:2024-11-13 Accepted:2024-11-22 Online:2025-01-15 Published:2024-12-12
Contact: Mengqiao Xu E-mail:stephanie1996@sina.com
Supported by:
Project supported by the National Natural Science Foundation of China (Grant Nos. 72101046 and 61672128).

摘要/Abstract

摘要： Recent studies employing deep learning to solve the traveling salesman problem (TSP) have mainly focused on learning construction heuristics. Such methods can improve TSP solutions, but still depend on additional programs. However, methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient. Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements. This paper proposes a novel framework for learning improvement heuristics, which automatically discovers better improvement policies for heuristics to iteratively solve the TSP. Our framework first designs a new architecture based on a transformer model to make the policy network parameterized, which introduces an action-dropout layer to prevent action selection from overfitting. It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism (named RL-SA) to learn the pairwise selected policy, aiming to improve the 2-opt algorithm's performance. The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning. The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods, and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets. Moreover, our pre-trained model M can be applied to guide the SA algorithm (named M-SA (ours)), which performs better than existing deep models in small-, medium-, and large-scale TSPLIB datasets. Additionally, the M-SA (ours) achieves excellent generalization performance in a real-world dataset on global liner shipping routes, with the optimization percentages in distance reduction ranging from 3.52% to 17.99%.

关键词: traveling salesman problem, deep reinforcement learning, simulated annealing algorithm, transformer model, whale optimization algorithm

Abstract: Recent studies employing deep learning to solve the traveling salesman problem (TSP) have mainly focused on learning construction heuristics. Such methods can improve TSP solutions, but still depend on additional programs. However, methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient. Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements. This paper proposes a novel framework for learning improvement heuristics, which automatically discovers better improvement policies for heuristics to iteratively solve the TSP. Our framework first designs a new architecture based on a transformer model to make the policy network parameterized, which introduces an action-dropout layer to prevent action selection from overfitting. It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism (named RL-SA) to learn the pairwise selected policy, aiming to improve the 2-opt algorithm's performance. The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning. The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods, and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets. Moreover, our pre-trained model M can be applied to guide the SA algorithm (named M-SA (ours)), which performs better than existing deep models in small-, medium-, and large-scale TSPLIB datasets. Additionally, the M-SA (ours) achieves excellent generalization performance in a real-world dataset on global liner shipping routes, with the optimization percentages in distance reduction ranging from 3.52% to 17.99%.

Key words: traveling salesman problem, deep reinforcement learning, simulated annealing algorithm, transformer model, whale optimization algorithm

中图分类号: (Algorithms)

87.55.kd

87.55.de (Optimization) 07.05.Mh (Neural networks, fuzzy logic, artificial intelligence)

Li Hong(洪莉), Yu Liu(刘宇), Mengqiao Xu(徐梦俏), and Wenhui Deng(邓文慧). Combining deep reinforcement learning with heuristics to solve the traveling salesman problem[J]. 中国物理B, 2025, 34(1): 18705-018705.

Li Hong(洪莉), Yu Liu(刘宇), Mengqiao Xu(徐梦俏), and Wenhui Deng(邓文慧). Combining deep reinforcement learning with heuristics to solve the traveling salesman problem[J]. Chin. Phys. B, 2025, 34(1): 18705-018705.

参考文献

[1] Tirkolaee E B, Cakmak E and Karadayi-Usta S 2024 International Transactions in Operational Research 0 13452
[2] Lamperti R D and de Arruda L V R 2023 Engineering Applications of Artificial Intelligence 126 106884
[3] Zhang X X 2020 Proceedings of the Sixth International Forum on Decision Sciences (Singapore: Springer) p. 65
[4] Xu Y and Che C 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication, July 12-14, 2019, Beijing, China, p. 7
[5] Si J, Yang S, Cen Y, et al. 2024 Nat. Commun. 15 3457
[6] Cheikhrouhou O and Khoufi I 2021 Computer Science Review 40 100369
[7] Yang L, Wang X, He Z, et al. 2023 International Conference on Bio- Inspired Computing: Theories and Applications, December, 2023, Singapore, p. 3
[8] Kizilates G and Nuriyeva F 2013 Advances in Computational Science, Engineering and Information Technology (Heidelberg: Springer) p. 111
[9] Snyder L V and Daskin M S 2006 European Journal of Operational Research 174 38
[10] Ezugwu A E S, Adewumi A O and Frıncu M E 2017 Expert Systems with Applications 77 189
[11] Cui Y, Zhong J, Yang F, et al. 2020 IEEE Access 8 227497
[12] Zhang J, Hong L and Liu Q 2020 Symmetry 13 48
[13] Juan A A, Kelton W D, Currie C S M, et al. 2018 Winter Simulation Conference, December 9-12, 2018, Gothenburg, Sweden, p. 3048
[14] Khalil E, Dai H, Zhang Y, et al. 2017 Advances in Neural Information Processing Systems, December, 2017, Long Beach, California, USA, p. 6351
[15] Arulkumaran K, Deisenroth M P, Brundage M, et al. 2017 IEEE Signal Processing Magazine 34 26
[16] Li Z, Liu F, Yang W, et al. 2021 IEEE Transactions On Neural Networks And Learning Systems 33 6999
[17] Ling Z, Tao X, Zhang Y, et al. 2020 IEEE Transactions on Systems, Man, and Cybernetics: Systems 51 7475
[18] Vaswani A, Shazeer N, Parmar N, et al. 2017 Advances in Neural Information Processing Systems, 2017, Long Beach, California, USA, p. 6000
[19] Wang J, Xiao C, Wang S, et al. 2023 The Journal of Engineering 9 e12303
[20] Ling Z, Zhang Y and Chen X 2023 IEEE Transactions on Intelligent Transportation Systems 24 5871
[21] Xing Z, Tu S and Xu L 2020 arXiv:2005.06879 [cs.LG]
[22] Luo J, Li C, Fan Q, et al. 2022 Engineering Applications of Artificial Intelligence 112 104848
[23] Deudon M, Cournut P, Lacoste A, et al. 2018 International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, June, 2018, p. 170
[24] Kool W, Van Hoof H and Welling M 2019 Attention, learn to solve routing problems! In Proceedings of the 7th International Conference on Learning Representations (ICLR), May 6-9, 2019, New Orleans, Louisiana, United States, p. 25
[25] Paulo R d O Costa, Rhuggenaath J, Zhang Y, et al. 2020 In Asian Conference on Machine Learning, November 18-20, 2020, Bangkok, Thailand, p. 465
[26] Ouyang W, Wang Y, Han S, et al. 2021 IEEE Symposium Series on Computational Intelligence, December 5-7, 2021, Orlando, FL, USA,
[27] Vitali T, Mele U J, Gambardella L M, et al. 2021 arXiv:2108.00938 [cs.AI]
[28] Parasteh S, Khorram A, Mouhoub M, et al. 2022 IEEE International Conference on Systems, Man, and Cybernetics, October 9-12, 2022, Prague, Czech Republic, p. 2514
[29] Wu Y, Song W, Cao Z, et al. 2021 IEEE Transactions on Neural Networks and Learning Systems 33 5057
[30] Navarro D J, Newell B R and Schulze C 2016 Cognitive Psychology 85 43
[31] Pei M, An H, Liu B, et al. 2021 IEEE Transactions on Systems, Man, and Cybernetics: Systems 52 4415
[32] Ghannadi P, Kourehli S S and Mirjalili S 2023 Frattura ed Integrità Strutturale 64 51
[33] Mirjalili S and Lewis A 2016 Advances in Engineering Software 95 51
[34] Zheng J, Yuan T, Xie W, et al. 2023 Sensors 23 6463
[35] Benyamin A, Farhad S G and Saeid B 2021 International Journal of Intelligent Systems 36 1270
[36] Sutton R and Barto A 1998 IEEE Transactions on Neural Networks 9 1054
[37] Srivastava N, Hinton G, Krizhevsky A, et al. 2014 The Journal of Machine Learning Research 15 1929
[38] Williams R J 1992 Machine Learning 8 229
[39] Al-Hamadani M N A, Fadhel M A, Alzubaidi L, et al. 2024 Sensors 24 2461
[40] Chen X and Tian Y 2019 Advances in Neural Information Processing Systems 2019 32
[41] Reinhelt G 2014 {TSPLIB}: a library of sample instances for the TSP (and related problems) from various sources and of various types
[42] Xu M, Pan Q, Muscoloni A, et al. 2020 Nat. Commun. 11 2849
[43] Helsgaun K 2017 Roskilde: Roskilde University 12 966
[44] Applegate D, Bixby R, Chvatal V, et al. 2006 Concorde TSP solver
[45] Perron L and Furnon V 2019 OR-Tools (version 7.2)
[46] Lei K, Guo P, Wang Y, et al. 2022 Neurocomputing 508 79
[47] Wang Q and Tang C 2021 Knowledge-Based Systems 233 107526
[48] Vinyals O, FortunatoMand Jaitly N 2015 Advances in Neural Information Processing Systems, December, 2015, Montreal, Canada, p. 2692
[49] Liu H, Zong Z, Li Y, et al. 2023 Applied Soft Computing 146 110680
[50] Rozić T, Naletina D and Zając M 2022 Applied Sciences 12 8452
[51] Verschuur J, Salmon N, Hall J, et al. 2024 Environmental Research: Infrastructure and Sustainability 4 015001
[52] Rbihou S and Haddouch K 2021 Fifth International Conference on Intelligent Computing in Data Sciences, October 20-22, 2021, Fez, Morocco, p. 1
[53] Wang S, Liu M and Chu F 2020 Journal of Cleaner Production 249 119433
[54] Kalatzantonakis P, Sifaleras A and Samaras N 2023 Expert Systems with Applications 213 118812

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

RichHTML

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

Metrics

本文评价

推荐阅读 0

[1]	Guo-Hui Yu(俞国慧) and Hong-Li Yang(杨洪礼). Quantum control based on three forms of Lyapunov functions[J]. 中国物理B, 2024, 33(4): 40201-040201.
[2]	Wei Deng(邓为), Guoyuan Qi(齐国元), and Xinchen Yu(蔚昕晨). Optimal control strategy for COVID-19 concerning both life and economy based on deep reinforcement learning[J]. 中国物理B, 2021, 30(12): 120203-120203.
[3]	陆欣泽, 邵桂芳, 许两有, 刘暾东, 文玉华. Structural optimization and segregation behavior of quaternary alloy nanoparticles based on simulated annealing algorithm[J]. 中国物理B, 2016, 25(5): 53601-053601.
[4]	赵星星, 朱巧芬, 张岩. The design of a photonic crystal filter in the terahertz range[J]. 中国物理B, 2009, 18(7): 2864-2867.
[5]	梁文锡, 张静娟, 吕俊峰, 廖睿. DESIGNING A DIFFRACTIVE OPTICAL ELEMENT FOR CONTROLLING THE BEAM PROFILE IN A THREE-DIMENSIONAL SPACE USING THE SIMULATED ANNEALING ALGORITHM[J]. 中国物理B, 2001, 10(12): 1129-1135.