تولید کلمات کلیدی متون فارسی با استفاده از یادگیری انتقالی

رحیمی, مرضیه; جلیلی جلال, عرفان; رحیمی, حسین

doi:10.22034/tjee.2022.15426

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

تعداد نشریات	45
تعداد شماره‌ها	1,478
تعداد مقالات	18,035
تعداد مشاهده مقاله	58,452,796
تعداد دریافت فایل اصل مقاله	19,820,140

	تولید کلمات کلیدی متون فارسی با استفاده از یادگیری انتقالی
مجله مهندسی برق دانشگاه تبریز
دوره 52، شماره 2 - شماره پیاپی 100، تیر 1401، صفحه 115-123 اصل مقاله (923.78 K)
نوع مقاله: علمی-پژوهشی
شناسه دیجیتال (DOI): 10.22034/tjee.2022.15426
نویسندگان
مرضیه رحیمی^* ¹؛ عرفان جلیلی جلال²؛ حسین رحیمی³
¹استادیار، دانشکده مهندسی کامپیوتر، دانشگاه صنعتی شاهرود، شاهرود، ایران
²دانشجوی دکتری، دانشکده مهندسی انفورماتیک، دانشگاه پورتو، پورتو، پرتغال
³فارغ التحصیل کارشناسی دانشکده مهندسی کامپیوتر، دانشگاه صنعتی شاهرود، شاهرود، ایران
چکیده
تولید خودکار کلمات کلیدی، نقش مهمی در بسیاری از کاربردهای تحلیلی متن و زبان‌های طبیعی، به‌ویژه در دسته‌بندی و بازیابی سریع متون دارد. بسیاری از روش‌های کنونی محدود به انتخاب کلماتی هستند که صریحاً در متن ذکر شده‌اند. استفاده از روش‌های دنباله‌به‌دنباله قادر است این نقصان را برطرف کند. البته استفاده از این روش‌ها معمولاً مستلزم وجود پیکره‌های عظیم است که برای زبان‌های کم‌منبع مثل فارسی یک چالش محسوب می‌شود. در چنین موقعیت‌هایی، یادگیری انتقالی که در آن یک مدل پیش‌آموخته بر روی یک وظیفه جدید با مجموعه کوچکتری از داده‌ها تطبیق داده می‌شود، می‌تواند راه‌گشا باشد. در این مقاله، برآنیم تا با استفاده از یک روش دنباله‌به‌دنباله مبتنی بر شبکه‌های عمیق انتقالی، به تولید کلمات کلیدی برای متون علمی فارسی بپردازیم. در همین راستا، پیکره‌ متنوعی از ٧۰هزار مقاله تخصصی به زبان فارسی و کلمات کلیدی متناظرشان جمع‌آوری شده است. سپس شبکه انتقالی پیش‌آموخته MT5 با استفاده از این پیکره، برای وظیفه تولید کلمات کلیدی، تنظیم و بازآموزی شده است. مدل حاصل، با چندین روش دیگر مقایسه شده است. نتایج این مقایسه حاکی از برتری حداقل 2.71 درصدی آن بر روش‌های موجود است.
کلیدواژه‌ها
تولید عبارات کلیدی؛ استخراج عبارات کلیدی؛ روش‌های دنباله‌به‌دنباله؛ شبکه‌های عمیق انتقالی؛ پیکره فارسی؛ خلاصه‌سازی چکیده‌ای

مراجع
[1] C. Belwal, S. Rai, and A. Gupta, "A new graph-based extractive text summarization using keywords or topic modeling," Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 10, pp. 8975–8990, 2021, doi: 10.1007/s12652-020-02591-x. [2] Li, J. Zhu, J. Zhang, C. Zong, and X. He, "Keywords-Guided Abstractive Sentence Summarization," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8196–8203, Apr. 2020, doi: 10.1609/aaai.v34i05.6333. [3] Wong, J. Thangarajah, and L. Padgham, "Contextual question answering for the health domain," Journal of the American Society for Information Science and Technology, vol. 63, no. 11, pp. 2313–2327, 2012, doi: 10.1002/asi.22733. [4] Willis, G. Davis, S. Ruan, L. Manoharan, J. Landay, and E. Brunskill, "Key Phrase Extraction for Generating Educational Question-Answer Pairs," in Proceedings of the Sixth (2019) ACM Conference on Learning @ Scale, 2019, pp. 1–10, doi: 10.1145/3330430.3333636. [5] Chaudhuri, N. Sinhababu, M. Sarma, and D. Samanta, "Hidden features identification for designing an efficient research article recommendation system," International Journal on Digital Libraries, vol. 22, no. 2, pp. 233–249, 2021, doi: 10.1007/s00799-021-00301-2. [6] Riaz, M. Fatima, M. Kamran, and M. W. Nisar, "Opinion mining on large scale data using sentiment analysis and k-means clustering," Cluster Computing, vol. 22, no. S3, pp. 7149–7164, May 2019, doi: 10.1007/s10586-017-1077-z. [7] Rahardja, T. Hariguna, and W. M. Baihaqi, "Opinion Mining on E-Commerce Data Using Sentiment Analysis and K-Medoid Clustering," in 2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media), 2019, pp. 168–170, doi: 10.1109/Ubi-Media.2019.00040. [8] Cano and O. Bojar, "Keyphrase Generation: A Multi-Aspect Survey," in 2019 25th Conference of Open Innovations Association (FRUCT), 2019, vol. 5, pp. 85–94, doi: 10.23919/FRUCT48121.2019.8981519. [9] Siddiqi, "Keyword and Keyphrase Extraction Techniques : A Literature Review," International Journal of Computer Applications, vol. 109, no. 2, pp. 18–23, 2015, doi: 10.5120/19161-0607. [10] Papagiannopoulou and G. Tsoumakas, "A review of keyphrase extraction," WIREs Data Mining and Knowledge Discovery, vol. 10, no. 2, pp. 1–59, Mar. 2020, doi: 10.1002/widm.1339. [11] Doostmohammadi, M. H. Bokaei, and H. Sameti, "PerKey: A Persian News Corpus for Keyphrase Extraction and Generation," in 2018 9th International Symposium on Telecommunications (IST), 2018, pp. 460–465, doi: 10.1109/ISTEL.2018.8661095. [12] Mohseni and H. Faili, "Title Generation and Keyphrase Extraction from Persian Scientific Texts," in 2020 25th International Computer Conference, Computer Society of Iran (CSICC), 2020, pp. 1–6, doi: 10.1109/CSICC49403.2020.9050113. [13] H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning, "KEA: practical automatic keyphrase extraction" in Proceedings of the fourth ACM conference on Digital libraries - DL ’99, 1999, pp. 254–255, doi: 10.1145/313238.313437. [14] D. Turney, "Learning algorithms for keyphrase extraction," Information Retrieval, vol. 2, no. 4, pp. 303–336, 2000, doi: 10.1023/A:1009976227802. [15] D. Turney, "Learning to Extract Keyphrases from Text," Dec. 1999. [16] R. El-Beltagy and A. Rafea, "KP-Miner: A keyphrase extraction system for English and Arabic documents," Information Systems, vol. 34, no. 1, pp. 132–144, 2009, doi: 10.1016/j.is.2008.05.002. [17] Campos, V. Mangaravite, A. Pasquali, A. M. Jorge, C. Nunes, and A. Jatowt, "YAKE! Collection-Independent Automatic Keyword Extractor," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10772 LNCS, pp. 806–810, 2018. [18] Ding, and X. Luo, "AttentionRank: Unsupervised Keyphrase Extraction using Self and Cross Attentions." In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1919-1928, 2021. [19] Wan and J. Xiao, "CollabRank: Towards a collaborative approach to single-document keyphrase extraction," Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference, vol. 1, no. August, pp. 969–976, 2008. [20] Mihalcea and P. Tarau, "TextRank: Bringing order into texts," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 - A meeting of SIGDAT, a Special Interest Group of the ACL held in conjunction with ACL 2004, 2004, vol. 85, pp. 404–411. [21] Bougouin, F. Boudin, and B. Daille, "TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction," in 6th International Joint Conference on Natural Language Processing, IJCNLP 2013 - Proceedings of the Main Conference, 2013, pp. 543–551. [22] Boudin, "Unsupervised Keyphrase Extraction with Multipartite Graphs," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, vol. 2, pp. 667–672, doi: 10.18653/v1/N18-2105. [23] Çano and O. Bojar, "Two Huge Title and Keyword Generation Corpora of Research Articles," in LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings, 2020, pp. 6663–6671. [24] Sharifi and M. A. Mahdavi, "Supervised approach for keyword extraction from Persian documents using lexical chains," Signal and Data Processing, vol. 15, no. 4, pp. 95–110, Mar. 2019, doi: 10.29252/jsdp.15.4.95. [25] امید حاجی پور، سعیده سادات سدیدپور، «استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از »، پدافند الکترونیکی و سایبری، جلد 8، شماره 2، صفحات 105-114. [26] Mehrabi, A. Mohebi, and A. Ahmadi, "Improved keyword extraction for persian academic texts using RAKE algorithm; case study: Persian theses and dissertations," Iranian Journal of Information Processing and Management, vol. 37, no. 1, pp. 197–228, 2021, doi: 10.52547/jipm.37.1.197. [27] Rose, D. Engel, N. Cramer, and W. Cowley, "Automatic Keyword Extraction from Individual Documents," in Text Mining, Chichester, UK: John Wiley & Sons, Ltd, 2010, pp. 1–20. [28] Lazemi, H. Ebrahimpour-Komleh, and N. Noroozi, "PAKE: a supervised approach for Persian automatic keyword extraction using statistical features," SN Applied Sciences, vol. 1, no. 12, pp. 1–4, 2019, doi: 10.1007/s42452-019-1627-5. [29] Veisi, N. Aflaki, and P. Parsafard, "Variance-based features for keyword extraction in Persian and English text documents," Scientia Iranica, vol. 27, no. 3 D, pp. 1301–1315, 2020, doi: 10.24200/SCI.2019.50426.1685. [30] Hejazi and J. A. Nasiri, "Keywords Extraction from Persian Thesis Using Statistical Features and Bayesian Classification," Language Related Research, vol. 12, no. 6, pp. 339–367, 2022, doi: 10.52547/LRR.12.6.11. [31] مریم باسره، ولی درهمی، سجاد ظریف‌زاده. «ارائه روشی برای استخراج خودکار عبارات کلیدی از اخبار وب پارسی»، مجله مهندسی برق دانشگاه تبریز، جلد 47، شماره 3، صفحات 857-866، 1396. [32] سعید دهقانی اشکذری، ولی درهمی، علی‌محمد زارع بیدکی، محمداحسان بصیری،. «عقیده‌کاوی در زبان فارسی مبتنی بر یادگیری انتقالی»، مجله مهندسی برق دانشگاه تبریز، جلد 50، شماره 3، صفحات 1215-1224، 1399. [33] Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A. and Raffel, C. "mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 483–498, doi: 10.18653/v1/2021.naacl-main.41. [34] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W. and Liu, P.J., "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer," Journal of Machine Learning Research, vol. 21, pp. 1–67, Oct. 2019. [35] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., "Attention Is All You Need," Advances in neural information processing systems, vol. 8, no. 1, pp. 8–15, Jun. 2017. [36] Rosset, "Turing-NLG: A 17-billion-parameter language model by Microsoft," Microsoft Blog, 2020. [Online]. Availabl https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/. [37] Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N. and Presser, S., "The Pile: An 800GB Dataset of Diverse Text for Language Modeling," Dec. 2020. [38] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and Agarwal, S.,, "Language models are few-shot learners," Advances in Neural Information Processing Systems, vol. 2020-Decem, no. NeurIPS, 2020. [39] Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P. and Chi, Y., 2017. Deep keyphrase generation. arXiv preprint arXiv:1704.06879.
آمار تعداد مشاهده مقاله: 632 تعداد دریافت فایل اصل مقاله: 517

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

تولید کلمات کلیدی متون فارسی با استفاده از یادگیری انتقالی