استفاده از شبکه‌های مولد تخاصمی در افزایش کارایی دسته بندی نظرات نامتعادل کاربران

جاوید, بهاره; مشایخی, هدی

doi:10.22034/tjee.2024.58494.4724

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

تعداد نشریات	45
تعداد شماره‌ها	1,444
تعداد مقالات	17,729
تعداد مشاهده مقاله	57,897,194
تعداد دریافت فایل اصل مقاله	19,518,675

	استفاده از شبکه‌های مولد تخاصمی در افزایش کارایی دسته بندی نظرات نامتعادل کاربران
مجله مهندسی برق دانشگاه تبریز
دوره 54، شماره 4 - شماره پیاپی 110، آذر 1403، صفحه 413-422 اصل مقاله (808.57 K)
نوع مقاله: علمی-پژوهشی
شناسه دیجیتال (DOI): 10.22034/tjee.2024.58494.4724
نویسندگان
بهاره جاوید؛ هدی مشایخی^*
دانشیار، دانشکده مهندسی کامپیوتر، دانشگاه صنعتی شاهرود، شاهرود، ایران
چکیده
روش‌های تولید متن برای تولید خودکار متون زبان طبیعی از هوش مصنوعی استفاده می‌کنند. یکی از کاربردهای تولید متن در دسته‌بندی متن است. بسیاری از مسائل دنیای واقعی با داده‌های متنی نامتعادل در ارتباط هستند که می‌تواند کارایی دسته‌بندی را کاهش دهد. یک رویکرد حل مشکل داده‌های نامتعادل، بیش-نمونه‌برداری از کلاس اقلیت است. با توجه به پیشرفت شبکه‌های مولد تخاصمی (GAN) در تولید داده، می‌توان از این شبکه‌ها برای تولید نمونه‌های متنی در بیش‌نمونه‌برداری استفاده کرد. تولید متن به کمک شبکه‌های مولد تخاصمی به دلیل ماهیت گسسته متن مسئله‌ای پیچیده است. علیرغم پتانسیل آن‌ها، استفاده این شبکه‌ها در حل مشکل داده‌های متنی نامتعادل به ندرت مورد بررسی قرار گرفته است. این مقاله به بررسی تاثیر استفاده از شبکه‌ی SentiGAN بر حل مشکل عدم تعادل نظرات کاربران با هدف بهبود کارایی دسته‌بندی می‌پردازد. بعد از ارائه روش پیشنهادی و چارچوب ارزیابی، چهار الگوریتم دسته‌بندی بر روی داده‌ها اجرا شده و معیارهای ارزیابی مختلف پیش و پس از بیش‌نمونه‌برداری محاسبه و تحلیل شده‌اند. هم‌چنین نتایج با روش‌های بیش‌نمونه‌برداری سنتی و اخیر مقایسه شده است. بیش‌نمونه‌برداری با روش پیشنهادی باعث افزایش معیار‌های صحت، دقت و تشخیص‌پذیری، و امتیاز اف دسته‌بندی داده‌های اقلیت نسبت به داده‌های نامتعادل و همچنین در مقایسه با روش‌های دیگر بیش‌نمونه‌برداری می‌شود.
کلیدواژه‌ها
شبکه‌های مولد تخاصمی (GAN)؛ دسته‌بندی متون نامتعادل؛ بیش‌نمونه‌برداری؛ متن نامتعادل؛ دسته بندی

مراجع
[1] G. P. Zhang, “Neural networks for classification: a survey”, IEEE Trans. Syst. Man, Cybern. Part C Appl. Rev., vol. 30, no. 4, pp. 451–462, 2000. [2] T. R. Baitharu and S. K. Pani, “Effect of Missing Values on Data Classification Corresponding Author : Tapas Ranjan Baitharu,” journals.co.za, vol. 4, no. 2, pp. 311–316, 2013. [3] C. Padurariu and M. E. Breaban, “Dealing with data imbalance in text classification”, in Procedia Computer Science, vol. 159, pp. 736–745, 2019. [4] I. Glaser, S. Sadegharmaki, B. Komboz, and F. Matthes, “Data scarcity: Methods to improve the quality of text classification”, In ICPRAM, pp. 556-564. 2021. [5] S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, “Handling imbalanced datasets : A review”, GESTS international transactions on computer science and engineering , vol. 30, no. 1, pp. 25–36, 2006. [6] Z. Xu, D. Shen, T. Nie, and Y. Kou, “A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data”, Journal of Biomedical Informatics, vol. 107, p. 103465, Jul. 2020. [7] J. Tian, S. Chen, X. Zhang, and Z. Feng, “A graph-based measurement for text imbalance classification”, European Conference on Artificial Intelligence, pp. 2188–2195, 2020. [8] H. He and E. A. Garcia, “Learning from imbalanced data”, IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009. [9] T. Raksachat and R. Chawuthai, “Improving a text classifier using text augmentation: road traffic content from Twitter”, In 2023 20th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) (pp. 1-4), 2023. [10] A. Amin et al., “Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study”, IEEE Access, vol. 4, pp. 7940–7957, 2016. [11] A. Sonak and R. A. Patankar, “A Survey on Methods to Handle Imbalance Dataset”, Int. J. Comput. Sci. Mob. Comput., vol. 4, no. 11, pp. 338–343, 2015. [12] G. Douzas and F. Bacao, “Effective data generation for imbalanced learning using conditional generative adversarial networks”, Expert Systems with applications, vol. 91, no. January 2018, pp. 464–471, 2018. [13] H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and solutions”, ACM Computing Surveys (CSUR), vol. 52, no. 4, 2019. [14] Available online at: https://www.section.io/engineering-education/beginners-intro-to-generative-modeling/#discriminative-and-generative-modeling. [15] T. Iqbal and S. Qureshi, “The survey: Text generation models in deep learning”, Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 6, 2022. [16] مرضیه رحیمی، عرفان جلیلی جلال، حسین رحیمی، « تولید کلمات کلیدی متون فارسی با استفاده از یادگیری انتقالی»، مجله مهندسی برق دانشگاه تبریز، جلد 52، شماره 2، صفحات 123-115، 1401. [17] I. Rivera-Trigueros, “Machine translation systems and quality assessment: a systematic review”, Language Resources and Evaluation, vol. 56, no. 2, pp. 593–619, 2022. [18] Y. Mori, H. Yamane, Y. Mukuta, and T. Harada, “Computational Storytelling and Emotions: A Survey”, arXiv (Cornell University), May 2022. [19] W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey”, Expert Systems with Applications, vol. 165. Pergamon, p. 113679, 2021. [20] M. Scholz, C. Brenner, and O. Hinz, “AKEGIS: automatic keyword generation for sponsored search advertising in online retailing”, Decision Support Systems, vol. 119, pp. 96–106, 2019. [21] B. Ojokoh and E. Adebisi, “A review of question answering systems”, Journal of Web Engineering, vol. 17, no. 8. pp. 717–758, 2019. [22] K. Shu, Y. Li, K. Ding, and H. Liu, “Fact-Enhanced Synthetic News Generation”, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 15, no. 15, pp. 13825–13833, 2021. [23] S. Talafha and B. Rekabdar, “Arabic Poem Generation Incorporating Deep Learning and Phonetic CNNsubword Embedding Models”, International Journal of Robotic Computing, pp. 64–91, 2019. [24] W. Fedus, I. Goodfellow, and A. M. Dai, “MaskGaN: Better text generation via filling in the”, 6th Int. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc., 2018. [25] سعید دهقانی اشکذری، ولی درهمی، علی محمد زارع بیدکی، محمداحسان بصیری، « عقیده‌کاوی در زبان فارسی مبتنی بر یادگیری انتقالی»، مجله مهندسی برق دانشگاه تبریز، جلد 50، شماره 3، صفحات 1224-1215، 1399. [26] M. Wielgosz et al., “Evaluation and implementation of n-gram-based algorithm for fast text comparison”, Computing and Informatics, vol. 36, no. 4, pp. 887–907, 2017. [27] J. G. Saliby, “Survey on Natural Language Generation”, International Journal of Trend in Scientific Research and Development, vol. Volume-3, no. Issue-3, pp. 618–622, 2019. [28] J. Weizenbaum, “ELIZA-A computer program for the study of natural language communication between man and machine”, Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966. [29] K. M. Colby, “Artificial paranoia: A computer simulation of paranoid processes”, Behavior Therapy, vol. 7, no. 1, p. 146, Jan. 1976. [30] G. Angeli, P. Liang, and D. Klein, “A simple domain-independent probabilistic approach to generation”, in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 502-512. 2010. [31] R. Barzilay and L. Lee, “Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization”, arXiv (Cornell University), pp. 113–120, 2004. [32] S. Santhanam and S. Shaikh, “A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems - Past, Present and Future Directions.,” arXiv (Cornell University), 2019 [33] T. Mikolov, M. Karafiát, L. Burget, C. Jan, and S. Khudanpur, “Recurrent neural network based language model”, Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, vol. 2, pp. 1045–1048,2010. [34] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks”, Advances in Neural Information Processing Systems, vol. 4, no. January, pp. 3104–3112, 2014. [35] I. Goodfellow et al., “Generative Adversarial Nets”, Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680, 2014. [36] F. H. K. dos S. Tanaka and C. Aranha, “Data Augmentation Using GANs”, arXiv [cs.LG], 2019. [37] Y. Zhang, “Deep Generative Model for Multi-Class Imbalanced Learning”, 2018.. [38] K. Wang and X. Wan, “Automatic generation of sentimental texts via mixture adversarial networks”, Artificial Intelligence, vol. 275, pp. 540–558, 2019. [39] G. Douzas and F. Bacao, “Effective data generation for imbalanced learning using conditional generative adversarial networks”, Expert Systems with applications, vol. 91, pp. 464–471, 2018. [40] W. Mao, Y. Liu, L. Ding, and Y. Li, “Imbalanced fault diagnosis of rolling bearing based on generative adversarial network: A comparative study”, IEEE Access, vol. 7, pp. 9515–9530, 2019. [41] U. Fiore, A. De Santis, F. Perla, P. Zanetti, and F. Palmieri, “Using generative adversarial networks for improving classification effectiveness in credit card fraud detection”, Information Sciences, vol. 479, pp. 448–455, 2019. [42] Y. Luo, H. Feng, X. Weng, K. Huang, and H. Zheng, “A novel oversampling method based on SeqGAN for imbalanced text classification”, 2019 IEEE International Conference on Big Data (Big Data), pp. 2891–2894, 2019. [43] L. Yu, W. Zhang, J. Wang, and Y. Yu, “SeqGAN: Sequence generative adversarial nets with policy gradient”, 31st AAAI Conf. Artif. Intell. AAAI 2017, pp. 2852–2858, 2017. [44] S. Bej, N. Davtyan, M. Wolfien, M. Nassar, and O. Wolkenhauer, “LoRAS: an oversampling approach for imbalanced datasets”, Machine Learning, vol. 110, no. 2, pp. 279–301, 2021. [45] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique”, Journal of artificial intelligence research , vol. 16, pp. 321–357, 2002. [46] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning”, in 2008 IEEE international joint conference on neural networks, pp. 1322–1328. [47] F. Rodríguez-Torres, J. F. Martínez-Trinidad, and J. A. Carrasco-Ochoa, “An Oversampling Method for Class Imbalance Problems on Large Datasets”, Applied Sciences, vol. 12, no. 7, 2022. [48] M. Torres-Vásquez, J. Hernández-Torruco, B. Hernández-Ocaña, and O. Chávez-Bosquez, “Impact of oversampling algorithms in the classification of guillain-barré syndrome main subtypes”, Ingenius. Revista de Ciencia y Tecnología, no. 25, pp. 20–31, 2021. [49] T. K. Ho, “Random decision forests”, in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 1995, vol. 1, pp. 278–282, 1995. [50] D. R. Cox, “The Regression Analysis of Binary Sequences”, Journal of the Royal Statistical Society: Series B (Methodological), vol. 20, no. 2, pp. 215–232, 1958. [51] D. J. Hand and K. Yu, “Idiot’s Bayes—not so stupid after all?”, International statistical review, vol. 69, no. 3, pp. 385–398, 2001. [52] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system”, In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, 2016. [53] R. Obiedat et al., “Sentiment analysis of customers’ reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution”, IEEE Access, vol. 10, pp. 22260–22273, 2022. [54] S. N. Almuayqil, M. Humayun, N. Z. Jhanjhi, M. F. Almufareh, and D. Javed, “Framework for improved sentiment analysis via random minority oversampling for user tweet review classification”, Electronics (Basel), vol. 11, no. 19, p. 3058, 2022.
آمار تعداد مشاهده مقاله: 599 تعداد دریافت فایل اصل مقاله: 453

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

استفاده از شبکه‌های مولد تخاصمی در افزایش کارایی دسته بندی نظرات نامتعادل کاربران