آشکارسازی و بازشناسی یکپارچه متن از تصاویر طبیعی با به‌کارگیری فرهنگ لغت

نعیمی, فاطمه; قدس, وحید; خالصی, حسن

doi:10.22034/jasp.2020.13293

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

تعداد نشریات	45
تعداد شماره‌ها	1,480
تعداد مقالات	18,080
تعداد مشاهده مقاله	58,504,201
تعداد دریافت فایل اصل مقاله	19,867,526

	آشکارسازی و بازشناسی یکپارچه متن از تصاویر طبیعی با به‌کارگیری فرهنگ لغت
پردازش سیگنال پیشرفته
مقاله 12، دوره 4، شماره 1 - شماره پیاپی 5، مرداد 1399، صفحه 133-149 اصل مقاله (1.6 M)
نوع مقاله: مقاله پژوهشی
شناسه دیجیتال (DOI): 10.22034/jasp.2020.13293
نویسندگان
فاطمه نعیمی¹؛ وحید قدس^* ²؛ حسن خالصی³
¹دانشگاه آزاد اسلامی سمنان، گروه مهندسی برق
²باشگاه پژوهشگران جوان و نخبگان، واحد سمنان، دانشگاه آزاد اسلامی، سمنان، ایران
³گروه مهندسی برق، واحد گرمسار، دانشگاه آزاد اسلامی، گرمسار، ایران.
چکیده
در سال‌های اخیرآشکارسازی و بازشناسی متن در تصاویر طبیعی به‌طور گسترده مورد مطالعه قرار گرفته است. در این پژوهش، یک سیستم مکان‌یابی متن در صحنه چندجهته مقاوم برای به دست آوردن بازدهی بالا در آشکارسازی متن بر اساس شبکه عصبی پیچشی(CNN) ارائه شده است. روش پیشنهادی شامل سه لایه استخراج ویژگی، ادغام ویژگی و خروجی می‌باشد. در لایه استخراج ویژگی، یک لایه ReLU بهبود یافته(i.ReLU) معرفی شده است. همچنین به‌منظورآشکارسازی متون با ابعاد متنوع، یک لایه inception بهبود یافته (i.inception) ارائه شده است. سپس، برای بهبود استخراج ویژگی از یک لایه اضافی استفاده شده است که ساختار پیشنهادی را قادر می‌سازد متون چندجهته حتی منحنی و عمودی را آشکارسازی نماید. همچنین، یک چارچوب خط لوله برای بازشناسی کاراکتر پیشنهاد نموده‌ایم. چارچوب خط لوله پیشنهادی شامل دو خط لوله موازی است که به‌طور هم‌زمان پردازش می‌شوند. خط لوله اول، متشکل از کلمات برش یافته و خط لوله دوم شامل زوایای متن می‌باشد. سپس، یک فرهنگ لغت جهت اصلاح خطای احتمالی کلمات بازشناسی شده استفاده نمودیم. آزمایش‌ها بر روی مجموعه داده‌های ICDAR 2013، ICDAR 2015 وICDAR 2019، نشان از برتری بارز سیستم پیشنهادی نسبت به کارهای پیشین دارد.
کلیدواژه‌ها
مکان‌یابی متن در صحنه؛ آشکارسازی تصویر متن؛ چندجهته؛ شبکه عصبی پیچشی؛ بازشناسی متن؛ بازشناسی یکپارچه متن؛ فرهنگ لغت

مراجع
[1] Neumann, L. and Matas, J., 2010, November. A method for text localization and recognition in real-world images. In Asian conference on computer vision (pp. 770-783). Springer, Berlin, Heidelberg. [2] Chen, J., Zhao, H., Yang, J., Zhang, J., Li, T. and Wang, K., 2017. An intelligent character recognition method to filter spam images on cloud. Soft Computing, 21(3), pp.753-763. [3] Zhu, Y., Yao, C. and Bai, X., 2016. Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), pp.19-36. [4] Zhu, W., Lou, J., Chen, L., Xia, Q. and Ren, M., 2017. Scene text detection via extremal region based double threshold convolutional network classification. PloS one, 12(8), p.e0182227. [5] Shi, B., Wang, X., Lyu, P., Yao, C. and Bai, X., 2016. Robust scene text recognition with automatic rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4168-4176). [6] Ren, X., Zhou, Y., Huang, Z., Sun, J., Yang, X. and Chen, K., 2017. A novel text structure feature extractor for Chinese scene text detection and recognition. IEEE Access, 5, pp.3193-3204. [7] Hanif, S.M. and Prevost, L., 2009, July. Text detection and localization in complex scene images using constrained adaboost algorithm. In 2009 10th international conference on document analysis and recognition (pp. 1-5). IEEE. [8] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y. and Xue, X., 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), pp.3111-3122. [9] Yao, C., Bai, X. and Liu, W., 2014. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 23(11), pp.4737-4749. Liao, M., Shi, B. and Bai, X., 2018. Textboxes++: A single-shot oriented scene text detector. IEEE transactions on image processing, 27(8), pp.3676-3690. Naiemi, F., Ghods, V. and Khalesi, H., 2019. An efficient character recognition method using enhanced HOG for spam image detection. Soft Computing, 23(22), pp.11759-11774. Ye, Q. and Doermann, D., 2014. Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence, 37(7), pp.1480-1500. Cho, H., Sung, M. and Jun, B., 2016. Canny text detector: Fast and robust scene text localization algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3566-3573). Epshtein, B., Ofek, E. and Wexler, Y., 2010, June. Detecting text in natural scenes with stroke width transform. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2963-2970). IEEE. Jaderberg, M., Simonyan, K., Vedaldi, A. and Zisserman, A., 2014. Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227. Wang, T., Wu, D.J., Coates, A. and Ng, A.Y., 2012, November. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st international conference on pattern recognition (ICPR2012) (pp. 3304-3308). IEEE. Jaderberg, M., Vedaldi, A. and Zisserman, A., 2014, September. Deep features for text spotting. In European conference on computer vision (pp. 512-528). Springer, Cham. Vasilopoulos, N. and Kavallieratou, E., 2017. Unified layout analysis and text localization framework. Journal of Electronic Imaging, 26(1), p.013009. Neumann, L. and Matas, J., 2015. Real-time lexicon-free scene text localization and recognition. IEEE transactions on pattern analysis and machine intelligence, 38(9), pp.1872-1885. Jaderberg, M., Simonyan, K., Vedaldi, A. and Zisserman, A., 2014. Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903. Jeong, M. and Jo, K.H., 2015, January. Multi language text detection using fast stroke width transform. In 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV) (pp. 1-4). IEEE. Ye, Q., Huang, Q., Gao, W. and Zhao, D., 2005. Fast and robust text detection in images and video frames. Image and vision computing, 23(6), pp.565-576. Pan, Y.F., Hou, X. and Liu, C.L., 2010. A hybrid approach to detect and localize texts in natural scene images. IEEE transactions on image processing, 20(3), pp.800-813. Jain, A.K. and Yu, B., 1998. Automatic text location in images and video frames. Pattern recognition, 31(12), pp.2055-2076. Koo, H.I. and Kim, D.H., 2013. Scene text detection via connected component clustering and nontext filtering. IEEE transactions on image processing, 22(6), pp.2296-2305. Yao, C., Bai, X., Liu, W., Ma, Y. and Tu, Z., 2012, June. Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1083-1090). IEEE. Liao, M., Shi, B., Bai, X., Wang, X. and Liu, W., 2017, February. Textboxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P. and Luo, Z., 2017. R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579. Luo, C., Jin, L. and Sun, Z., 2019. Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognition, 90, pp.109-118. Zheng, Y., Iwana, B.K. and Uchida, S., 2019. Mining the displacement of max-pooling for text recognition. Pattern Recognition, 93, pp.558-569. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W. and Liang, J., 2017. East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 5551-5560). Liu, F., Chen, C., Gu, D. and Zheng, J., 2019. FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access, 7, pp.44219-44228. Tian, Z., Huang, W., He, T., He, P. and Qiao, Y., 2016, October. Detecting text in natural image with connectionist text proposal network. In European conference on computer vision (pp. 56-72). Springer, Cham. Huang, W., Qiao, Y. and Tang, X., 2014, September. Robust scene text detection with convolution neural network induced mser trees. In European conference on computer vision (pp. 497-511). Springer, Cham. Wang, R., Sang, N. and Gao, C., 2015. Text detection approach based on confidence map and context information. Neurocomputing, 157, pp.153-165. Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., Lin, W. and Chu, W., 2018. Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv preprint arXiv:1805.01167. Ghanei, S. and Faez, K., 2015. Robust localization of texts in real-world images. International Journal of Pattern Recognition and Artificial Intelligence, 29(07), p.1555012. Ghavidel, J., Ahmadyfard, A. and Zahedi, M., 2019. Natural scene text localization using edge color signature. International Journal of Nonlinear Analysis and Applications, 10(1), pp.229-237. Islam, M.R., Mondal, C., Azam, M.K. and Islam, A.S.M.J., 2016, May. Text detection and recognition using enhanced MSER detection and a novel OCR technique. In 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) (pp. 15-20). IEEE. Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D. and Shen, H.T., 2019. Sequence-to-sequence domain adaptation network for robust text image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2740-2749). Wang, Q., Huang, Y., Jia, W., He, X., Blumenstein, M., Lyu, S. and Lu, Y., 2020. FACLSTM: ConvLSTM with focused attention for scene text recognition. Science China Information Sciences, 63(2), pp.1-14. Hong, S., Roh, B., Kim, K.H., Cheon, Y. and Park, M., 2016. PVANet: Lightweight deep neural networks for real-time object detection. arXiv preprint arXiv:1611.08588. Zhan, F., Zhu, H. and Lu, S., 2019. Scene text synthesis for efficient and effective deep network training. arXiv preprint arXiv:1901.09193. Huang, L., Yang, Y., Deng, Y. and Yu, Y., 2015. Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874. Kim, K.H., Hong, S., Roh, B., Cheon, Y. and Park, M., 2016. Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826). Bissacco, A., Cummins, M., Netzer, Y. and Neven, H., 2013. Photoocr: Reading text in uncontrolled conditions. In Proceedings of the ieee international conference on computer vision (pp. 785-792). Amin, K.M., Shahin, A.I. and Guo, Y., 2016. A novel breast tumor classification algorithm using neutrosophic score features. Measurement, 81, pp.210-220. Jemni, S.K., Kessentini, Y. and Kanoun, S., 2019. Out of vocabulary word detection and recovery in Arabic handwritten text recognition. Pattern Recognition, 93, pp.507-520. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A. and De Las Heras, L.P., 2013, August. ICDAR 2013 robust reading competition. In 2013 12th International Conference on Document Analysis and Recognition (pp. 1484-1493). IEEE. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S. and Shafait, F., 2015, August. ICDAR 2015 competition on robust reading. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1156-1160). IEEE. Biten, A.F., Tito, R., Mafla, A., Gomez, L., Rusinol, M., Mathew, M., Jawahar, C.V., Valveny, E. and Karatzas, D., 2019, September. Icdar 2019 competition on scene text visual question answering. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1563-1570). IEEE. Bengio, Y., 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437-478). Springer, Berlin, Heidelberg. Breuel, T.M., 2015. The effects of hyperparameters on SGD training of neural networks. arXiv preprint arXiv:1508.02788.
آمار تعداد مشاهده مقاله: 630 تعداد دریافت فایل اصل مقاله: 654

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

آشکارسازی و بازشناسی یکپارچه متن از تصاویر طبیعی با به‌کارگیری فرهنگ لغت