ارائه مدلی جهت پیش‌بینی ارتباط بین واحدهای دانشی در وب سایت‌های پرسش و پاسخ برنامه‌نویسی با استفاده از تکنیک‌های یادگیری عمیق: مطالعه‌موردی Stack Overflow

عباسی مهر, حسین; خودی زاده, محمد

doi:10.22034/tjee.2023.16964

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

تعداد نشریات	45
تعداد شماره‌ها	1,385
تعداد مقالات	16,972
تعداد مشاهده مقاله	54,619,749
تعداد دریافت فایل اصل مقاله	17,224,046

	ارائه مدلی جهت پیش‌بینی ارتباط بین واحدهای دانشی در وب سایت‌های پرسش و پاسخ برنامه‌نویسی با استفاده از تکنیک‌های یادگیری عمیق: مطالعه‌موردی Stack Overflow
مجله مهندسی برق دانشگاه تبریز
مقاله 5، دوره 54، شماره 1 - شماره پیاپی 107، اردیبهشت 1403، صفحه 45-54 اصل مقاله (1.12 M)
نوع مقاله: علمی-پژوهشی
شناسه دیجیتال (DOI): 10.22034/tjee.2023.16964
نویسندگان
حسین عباسی مهر^* ؛ محمد خودی زاده
استادیار، دانشکده فناوری اطلاعات و مهندسی کامپیوتر، دانشگاه شهید مدنی آذربایجان، تبریز، ایران
چکیده
وب‌سایت Stack Overflow یکی از محبوب‌ترین جوامعی است که میلیون‌ها برنامه‌نویس از آن استفاده می‌کنند. اگر یک سوال و پاسخ‌های متناظر با آن را یک واحد دانشی در نظر بگیریم، آنگاه بین واحدهای دانشی ارتباط مختلف معنایی وجود دارد که این ارتباط شامل ارتباط تکراری، ارتباط مستقیم، ارتباط غیرمستقیم با سوال طرح‌شده است. تشخیص دسته‌های مختلف ارتباط معنایی بین واحدهای دانشی در Stack Overflow می‌تواند اثربخشی و کارایی جستجوی اطلاعات را به‌طور چشمگیری بهبود بخشد. در این مطالعه، یک رویکرد ترکیبی مبتنی بر روش‌های یادگیری عمیق و معیارهای تشابه سنتی جهت تشخیص ارتباط بین سوالات ارائه می‌شود. به‌طور خاص دو معماری شبکه عمیق ارائه می‌شود که معماری اول از شبکه حافظه کوتاه‌مدت طولانی دوطرفه و همچنین لایه محاسبه کننده شباهت کسینوسی تشکیل شده است. معماری دوم گسترش یافته‌ی معماری اول با اضافه کردن مکانیزم توجه است. رویکرد پیشنهادی روی یک مجموعه داده سوالات زبان برنامه‌نویسی جاوا شامل 40000 مورد ارزیابی قرار گرفت. نتایج به‌دست‌آمده نشان می‌دهد که در معیار‌های F1، Recall و Precision مدل پیشنهادی عملکرد بهتری نسبت به مدل‌های موجود از خود نشان می‌دهد. به طور خاص مدل پیشنهادی در این مقاله در معیار F1 بهبود 17.3 درصدی نسبت به برترین مدل فعلی دارد. همچنین نتایج آزمایش‌ها نشان می‌دهد که استفاده از مدل تعبیه کلمات از پیش آموزش‌دیده به‌طور قابل‌ملاحظه‌ای عملکرد مدل‌های ارائه‌شده را بهبود می‌بخشد.
کلیدواژه‌ها
تشخیص ارتباط؛ دسته‌بندی چند کلاسه؛ روش BiLSTM؛ مکانیزم توجه؛ معیارهای شباهت متن

مراجع
[1] مهدی دهقان، احمدعلی آبین، « بازیابی و رتبه‌بندی افراد خبره با استفاده از مدل ترجمه مبتنی بر خوشه‌بندی»، مجله مهندسی برق دانشگاه تبریز، جلد 49، شماره 3، صفحات 1106-1095، 1398. [2] P. Chakraborty, R. Shahriyar, A. Iqbal, and G. Uddin, "How do developers discuss and support new programming languages in technical Q&A site? An empirical study of Go, Swift, and Rust in Stack Overflow", Information and Software Technology, vol. 137, pp. 106603, 2021. [3] H. Shu, P. Gao, Z. Yang, C. Li, and M. Wu, "Exploring the Feasibility of Transformer Based Models on Question Relatedness", Proceedings of 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pp. 831-838 , 2022. [4] J. He, Z. Xin, B. Xu, T. Zhang, K. Kim, Z. Yang, et al., "Representation Learning for Stack Overflow Posts: How Far are We?", arXiv preprint arXiv:2303.06853, 2023. [5] A. Kumar, D. Ghadiyali, S. Chimalakonda, and A. S. M. Venigalla, "SOCluster-Towards Answering Unanswered Questions on Stack Overflow via Answered Questions", Proceedings of the 16th Innovations in Software Engineering Conference, pp . 1-5,2023. [6] S.-K. Guo, S.-W. Wang, H. Li, Y.-L. Fan, Y.-Q. Liu, and B. Zhang, "Multi-Feature Fusion Based Structural Deep Neural Network for Predicting Answer Time on Stack Overflow", Journal of Computer Science and Technology, vol. 38, no. 3, pp. 582-599, 2023. [7] P. K. Roy, S. Saumya, J. P. Singh, S. Banerjee, and A. Gutub, "Analysis of community question‐answering issues via machine learning and deep learning: State‐of‐the‐art review", CAAI Transactions on Intelligence Technology, vol. 8, no. 1, pp. 95-117, 2023. [8] B. Xu, D. Ye, Z. Xing, X. Xia, G. Chen, and S. Li, "Predicting semantically linkable knowledge in developer online forums via convolutional neural network", Proceedings of 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 51-62, 2016. [9] L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, and M. Lanza, "Prompter: Turning the IDE into a self-confident programming assistant", Empirical Software Engineering, vol. 21, pp. 2190-2231, 2016. [10] Y. Zhang, D. Lo, X. Xia, and J.-L. Sun, "Multi-factor duplicate question detection in stack overflow", Journal of Computer Science and Technology, vol. 30, pp. 981-997, 2015. [11] W. E. Zhang, Q. Z. Sheng, J. H. Lau, E. Abebe, and W. Ruan, "Duplicate detection in programming question answering communities", ACM Transactions on Internet Technology (TOIT), vol. 18, no. 3, pp. 1-21, 2018. [12] W. Gao, J. Wu, and G. Xu, "Detecting duplicate questions in stack overflow via source code modeling", Int J Software Eng Knowl Eng, vol. 32, no. 02, pp. 227-255, 2022. [13] B. Xu, A. Shirani, D. Lo, and M. A. Alipour, "Prediction of relatedness in stack overflow: deep learning vs. svm: a reproducibility study", Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1-10, 2018. [14] A. Shirani, B. Xu, D. Lo, T. Solorio, and A. Alipour, "Question relatedness on stack overflow: the task, dataset, and corpus-inspired models", arXiv preprint arXiv:1905.01966, 2019. [15] رضا خدایی، محمدعلی بالافر، سیدناصر رضوی،« اثربخشی بسط پرس‌وجودمبتنی بر خوشه‌بندی اسناد شبه‌بازخورد با الگوریتم»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 1، صفحات 143-151، 1395. [16] مرضیه رحیمی، عرفان جلیلی جلال، حسین رحیمی، « تولید کلمات کلیدی متون فارسی با استفاده از یادگیری انتقالی»، مجله مهندسی برق دانشگاه تبریز، جلد 52، شماره 2، صفحات 115-123، 1401 . [17] E. Zafarani-Moattar, M. R. Kangavari, and A. M. Rahmani, "Topic Detection on COVID-19 Tweets: A Comparative Study on Clustering and Transfer Learning Models", TABRIZ JOURNAL OF ELECTRICAL ENGINEERING, vol. 52, no. 4, pp. 281-291, 2022. [18] M.-T. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation", arXiv preprint arXiv:1508.04025, 2015. [19] M. Ahasanuzzaman, M. Asaduzzaman, C. K. Roy, and K. A. Schneider, "Mining duplicate questions in stack overflow", Proceedings of the 13th International Conference on Mining Software Repositories, pp. 402-412, 2016. [20] L. Wang, L. Zhang, and J. Jiang, "Duplicate Question Detection With Deep Learning in Stack Overflow", IEEE Access, vol. 8, pp. 25964-25975, 2020. [21] R. F. G. Silva, K. Paixão, and M. d. A. Maia, "Duplicate question detection in stack overflow: A reproducibility study", Proceedings of 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 572-581, 2018. [22] W. E. Zhang, Q. Z. Sheng, Y. Shu, and V. K. Nguyen, "Feature analysis for duplicate detection in programming QA communities", Proceedings of Advanced Data Mining and Applications: 13th International Conference, ADMA 2017, pp. 623–638, 2017. [23] N. Ansari and R. Sharma, "Identifying semantically duplicate questions using data science approach: A quora case study", arXiv preprint arXiv:2004.11694, 2020. [24] A. Kamienski, A. Hindle, and C.-P. Bezemer, "Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers", Empirical Software Engineering, vol. 28, no. 1, pp. 17, 2023. [25] D. Charlet and G. Damnati, "Simbow at semeval-2017 task 3: Soft-cosine semantic similarity between questions for community question answering," Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 315–319, 2017. [26] J. Pei, Y. Wu, Z. Qin, Y. Cong, and J. Guan, "Attention-based model for predicting question relatedness on Stack Overflow", Proceedings of. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp. 97-107, 2021. [27] "Stack Overflow", avaiable online at: https://en.wikipedia.org/wiki/Stack_Overflow [28] J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation", Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014. [29] A. Darvish and S. Shamekhi, "A hybrid multi-scale CNN-LSTM deep learning model for the identification of protein-coding regions in DNA sequences", TABRIZ JOURNAL OF ELECTRICAL ENGINEERING, vol. 52, no. 2, pp. 137-146, 2022. [30] F. Chollet, "Keras: The python deep learning library", Astrophysics source code library, 2018. [31] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization", arXiv preprint arXiv:1412.6980, 2014.
آمار تعداد مشاهده مقاله: 564 تعداد دریافت فایل اصل مقاله: 328

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

ارائه مدلی جهت پیش‌بینی ارتباط بین واحدهای دانشی در وب سایت‌های پرسش و پاسخ برنامه‌نویسی با استفاده از تکنیک‌های یادگیری عمیق: مطالعه‌موردی Stack Overflow