استفاده از تئوری آشوب جهت تعادل بین کاوش و بهره‌گیری در یادگیری تقویتی عمیق

Khodadadi, Habib; Derhami, Vali

doi:10.22034/tjee.2024.61074.4824

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

تعداد نشریات	45
تعداد شماره‌ها	1,418
تعداد مقالات	17,445
تعداد مشاهده مقاله	56,254,123
تعداد دریافت فایل اصل مقاله	18,600,526

	استفاده از تئوری آشوب جهت تعادل بین کاوش و بهره‌گیری در یادگیری تقویتی عمیق
مجله مهندسی برق دانشگاه تبریز
دوره 55، شماره 1 - شماره پیاپی 111، خرداد 1404، صفحه 113-121 اصل مقاله (749.93 K)
نوع مقاله: علمی-پژوهشی
شناسه دیجیتال (DOI): 10.22034/tjee.2024.61074.4824
نویسندگان
Habib Khodadadi¹؛ Vali Derhami^* ²
¹دانشکده مهندسی کامپیوتر، پردیس فنی و مهندسی، دانشگاه یزد، یزد، ایران
²دانشکده مهندسی برق و کامپیوتر، پردیس فنی و مهندسی، دانشگاه یزد، یزد، ایران
چکیده
یادگیری تقویتی عمیق به طور گسترده‎ای در مسائل یادگیری ماشینی استفاده می‎شود و استفاده از روش‎هایی جهت بهبود کارکرد آن حائز اهمیت است. تعادل بین کاوش و بهره گیری یکی از مسائل مهم در یادگیری تقویتی است و برای این منظور از روش های انتخاب عملی که همراه با کاوش هستند همچون شبه حریصانه و بیشینه‌نرم استفاده می شود. در این روش ها به کمک تولید اعداد تصادفی و مقدار ارزش عمل، عملی انتخاب می شود که بتواند این تعادل را برقرار کند. در طول زمان با کاوش مناسب می توان انتظار داشت که محیط بهتر شناخته شده و اعمال باارزش بیشتر شناسائی شوند. آشوب با داشتن ویژگی هائی همچون حساسیت زیاد به شرایط اولیه، غیر تناوبی، غیر قابل پیش بینی، بازدید از همه حالت های فضای جستجو و رفتار شبه تصادفی، دارای کاربردهای فراوانی است. در این مقاله، از اعداد تولیدی توسط سیستم های آشوبناک جهت استفاده در روش‌ انتخاب عمل شبه حریصانه در یادگیری تقویتی عمیق به منظور بهبود تعادل بین کاوش و بهره گیری، استفاده می شود؛ علاوه بر آن تاثیر استفاده از آشوب در حافظه تکرار تجارب نیز بررسی خواهد شد. آزمایش های انجام شده در محیط Lunar Lander ، نشان دهنده افزایش قابل توجه سرعت یادگیری و کسب جایزه بیشتر در این محیط است
کلیدواژه‌ها
انتخاب عمل؛ تئوری آشوب؛ کاوش و بهره‌گیری؛ یادگیری تقویتی عمیق

مراجع
[1] ولی درهمی، فریناز اعلمیان، محمدباقر دولتشاهی، «یادگیری تقویتی»، انتشارات دانشگاه یزد، 1396. [2] سید علی خوشرو، سید حسین خواسته، «افزایش سرعت فرآیند یادگیری DQN با مکانیزم آثار شایستگی»، مجله کنترل، جلد 14، شماره 4، صفحات 23-13، 1399. [3] P. Ladosz, L. Weng, M. Kim, H. Oh, “Exploration in deep reinforcement learning: A survey”, Information Fusion, vol. 85, pp. 1-22, 2022. [4] H. Khodadadi, A. Zandvakili, “A New Method for Encryption of Color Images based on Combination of Chaotic Systems”, Journal of AI and Data Mining, vol. 7, no. 3, pp. 377-383, 2019. [5] R.B. Naik, U. Singh, “A review on applications of chaotic maps in pseudo-random number generators and encryption”, Annals of Data Science, vol. 11, no. 1, pp. 25-50, 2024. [6] H. Liu, A. Kadir, Y. Li, “Audio encryption scheme by confusion and diffusion based on multi-scroll chaotic system and one-time keys”, Optik, vol. 127, no. 19, pp. 7431-7438, 2016. [7] M.S Azzaz, M.A. Krimil, “A new chaos-based text encryption to secure gps data”, In 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT), October 2018, Algiers, Algeria , pp. 294-299. [8] H. Xu, X. Tong, X. Meng, “An efficient chaos pseudo-random number generator applied to video encryption”, Optik, vol. 127, no. 20, pp. 9305-9319, 2016. [9] K. Chen, B. Xue, M. Zhang, F. Zhou, “Novel chaotic grouping particle swarm optimization with a dynamic regrouping strategy for solving numerical optimization tasks”, Knowledge-Based Systems, 194, 105-123, 2020. [10] مجید محمدپور، حمید پروین، «الگوریتم ژنتیک آشوب گونه مبتنی بر حافظه و خوشه بندی برای حل مسائل بهینه سازی پویا»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، صفحات 299-318، 1395. [11] صمد نجاتیان، وحیده رضایی، حمید پروین، «ارائه یک الگوریتم چندجمعیتی مبتنی بر ازدحام ذرات برای حل مسائل بهینه‌سازی پویا»، مجله مهندسی برق دانشگاه تبریز، جلد 48، شماره 3، صفحات 1405-1423، 1397. [12] Z. Liang, Z. Xiao, J. Wang, L. Sun, B. Li, Y. Hu, Y. Wu, “An improved chaos similarity model for hydrological forecasting”, Journal of Hydrology, vol. 577, pp. 123-133, 2019. [13] Z. Hua, Y. Zhou, “Exponential chaotic model for generating robust chaos”, IEEE transactions on systems, man, and cybernetics: systems, vol. 51, no. 6, pp. 3713-3724, 2019. [14] J.T. Chien, P.C. Hsu, “Stochastic curiosity maximizing exploration”, In 2020 International Joint Conference on Neural Networks (IJCNN), July 2020, Glasgow, UK, pp. 1-8. [15] T. Lin, A. Jabri, “MIMEx: intrinsic rewards from masked input modeling”, arXiv preprint arXiv:2305.08932, 2023. [16] V. Derhami, V.J. Majd, M.N. Ahmadabadi, “Exploration and exploitation balance management in fuzzy reinforcement learning”, Fuzzy sets and systems, vol. 161, no. 4, pp. 578-595, 2010. [17] B.H. Abed-alguni, “Action-selection method for reinforcement learning based on cuckoo search algorithm”, Arabian Journal for Science and Engineering, vol. 43, no. 12, pp. 6771-6785, 2018. [18] A. Ecoffet, J. Huizinga, J. Lehman, K.O. Stanley, J. Clune, “First return, then explore”, Nature, vol. 590, no. 7847, pp. 580-586, 2021. [19] M. Usama, D.E. Chang, “Learning-driven exploration for reinforcement learning”, In 2021 21st International Conference on Control, Automation and Systems (ICCAS ), (2021, October).) (pp. 1146-1151). IEEE. [20] G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa, “Safe exploration in continuous action spaces”, arXiv preprint arXiv:1801.08757, 2018. [21] T.G. Karimpanal, S. Rana, S. Gupta, T. Tran, S. Venkatesh, “Learning transferable domain priors for safe exploration in reinforcement learning", In 2020 International Joint Conference on Neural Networks (IJCNN),July 2020, Glasgow, UK, pp. 1-10. [22] A. Zhaikhan, A.H. Sayed, “Graph Exploration for Effective Multiagent Q-Learning”, IEEE Transactions on Neural Networks and Learning Systems, pp. 1-12, 2024. [23] K. Morihiro, T. Isokawa, N. Matsui, H. Nishimura, “Effects of chaotic exploration on reinforcement learning in target capturing task”, International Journal of Knowledge-based and Intelligent Engineering Systems, vol. 12, no. 5-6, pp.369-377, 2008. [24] K. Morihiro, T. Isokawa, N. Matsui, H. Nishimura, “Reinforcement learning by chaotic exploration generator in target capturing task”, In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, 2005, Melbourne, Australia, pp. 1248-1254. [25] K. Morihiro, N. Matsui, H. Nishimura, “Effects of chaotic exploration on reinforcement maze learning”, In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, (pp. 833-839). Springer, Berlin, Heidelberg. [26] K. Morihiro, N. Matsui, H. Nishimura, “Chaotic exploration effects on reinforcement learning in shortcut maze task”, International Journal of Bifurcation and Chaos, vol. 16, no. 10, pp. 3015-3022, 2006. [27] A.B Potapov, and M.K Ali, “Learning, exploration and chaotic policies”, International Journal of Modern Physics C, vol. 11, no. 07, pp. 1455-1464, 2000. [28] H. Khodadadi, V. Derhami, “Improving Speed and Efficiency of Dynamic Programming Methods through Chaos”, Journal of AI and Data Mining, vol. 9, no. 4, pp. 487-496, 2021. [29] B. Zarei, M.R Meybodi, “Improving learning ability of learning automata using chaos theory”, The Journal of Supercomputing, vol. 77, no. 1, pp. 652-678, 2021. [30] X. Zhang, Y. Cao, “A novel chaotic map and an improved chaos-based image encryption scheme”, The Scientific World Journal, Article ID 713541, 2014. [31] H. Van Hasselt, A. Guez, D. Silver, “Deep reinforcement learning with double q-learning”, In Proceedings of the AAAI conference on artificial intelligence, (Vol. 30, No. 1), Phoenix, Arizona USA. [32] T. Schaul, J. Quan, I. Antonoglou, D. Silver, “Prioritized experience replay”, arXiv preprint arXiv:1511.05952, 2015. [33] Z. Wang,T. Schaul, M. Hessel, H. Hasselt, M .Lanctot, N. Freitas, “Dueling network architectures for deep reinforcement learning”, In International conference on machine learning, June 2016, PMLR, pp. 1995-2003. [34] RS. Sutton, AG, Barto, “Reinforcement learning: An introduction”, 2nd Ed, The MIT Press, London, 2018. [35] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, “Open ai gym”, ArXiv:1606.01540, 2016.
آمار تعداد مشاهده مقاله: 295 تعداد دریافت فایل اصل مقاله: 149

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

استفاده از تئوری آشوب جهت تعادل بین کاوش و بهره‌گیری در یادگیری تقویتی عمیق