LINGUISTIC ANALYSIS ON CURSIVE CHARACTERS

  • CHIAI AL-ATROSHI Dept. of Educational Counselling and Psychology, College of Basic Education, University of Duhok, Kurdistan Region-Iraq
Keywords: Convolutional Neural Network, Deep Learning, Machine Learning, Linguistic analysis, character recognition

Abstract

Document Analysis has major importance in Information Retrieval Systems. Dredged with vaults of paper and material documents, to protect very important information and the summaries, without losing their meaning and importance, each document need to be properly curated and processed. Ancient written documents possess many types of cursive language character sets, which are very tedious to discriminate the characters and subsequently the right meaning. To overcome the difficulties of reading the cursive language characters and prevent misunderstanding the meaning and the importance of documents, an improvised CNN [6] model to work on OCR  and Tesseract API has been proposed in this work. The documents are scanned, curated and preprocessed in the forms of images. CNN are the best algorithms, hitherto in the existing AI and Deep Learning arena. CNN with OCR API could contribute to the development of efficient strategies of character recognition even with complex cursive styles. A method which is adaptable to the classification and segmentation of the text images with cursive styles is proposed I this article. Tesseract is the popular and effective OCR library with rich API that can enrich the CNN-OCR model

Downloads

Download data is not yet available.

References

J. Mariyathas, V. Shanmuganathan and B. Kuhaneswaran, "Sinhala Handwritten Character Recognition using Convolutional Neural Network," 2020 5th International Conference on Information Technology Research (ICITR), 2020, pp. 1-6, doi: 10.1109/ICITR51448.2020.9310914.
Benaddy, Mohamed, Othmane El Meslouhi, Youssef Es-saady, and Mustapha Kardouchi. "Handwritten Tifinagh characters recognition using deep convolutional neural networks." Sensing and Imaging 20, no. 1 (2019): 1-17.
Wei, Tan, UsmanUllah Sheikh and Ab Al-HadiAbRahman. "Improved optical character recognition with deep neural network." 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA) (2018): 245-249.
Anil, R., Manjusha, K., Kumar, S.S., Soman, K.P. (2015). Convolutional Neural Networks for the Recognition of Malayalam Characters. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 328. Springer, Cham. https://doi.org/10.1007/978-3-319-12012-6_54
Cortez, Corinna; Burges, Christopher C.J.; LeCun, Yann, "The MNIST Handwritten Digit Database". YannLeCun's Website yann.lecun.com. Retrieved 30 April 2020.
Sarker, Goutam, and Swagata Ghosh. "A convolution neural network for optical character recognition and subsequent machine translation." Int. Journal of Computer Application 182, no. 30 (2018): 23-27.
Ko, Daegun, Suhan Song, Kimin Kang, Seongwook Han, and Juneho Yi. "Optical Character Recognition performance comparison of Convolution Neural Network and Tesseract." IEICE Proceedings Series 61, no. P1-11 (2016).
Zhao, Haifeng, Yong Hu, and Jinxia Zhang. "Character recognition via a compact convolutional neural network." In 2017 International conference on digital image computing: techniques and applications (DICTA), pp. 1-6. IEEE, 2017.
Sarker, Goutam. "A survey on convolution neural networks." In 2020 IEEE REGION 10 CONFERENCE (TENCON), pp. 923-928. IEEE, 2020.
Kim, Jinho. "Character Level and Word Level English License Plate Recognition Using Deep-learning Neural Networks." Journal of Korea Society of Digital Industry and Information Management 16, no. 4 (2020): 19-28.
Mai, VinhDu, Duoqian Miao, and Ruizhi Wang. "Vietnam license plate recognition system based on edge detection and neural networks." Journal of Information and Computing Science 8, no. 1 (2013): 27-40.
Raj, Aman, Devanshu Dubey, Abhishek Mishra, Nikhil Chopda, Nishant M. Borkar, and Vipul S. Lande. "Convolution neural network based automatic license plate recognition system." International Journal of Computer Sciences and Engineering 7, no. 4 (2019): 199-205.
Rawls, Stephen, Huaigu Cao, Senthil Kumar, and Prem Natarajan. "Combining convolutional neural networks and lstms for segmentation-free ocr." In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1, pp. 155-160. IEEE, 2017.
Sharma, Arnab Sen, Maruf Ahmed Mridul, Marium-E. Jannat, and Md Saiful Islam. "A Deep CNN Model for Student Learning Pedagogy Detection Data Collection Using OCR." In 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1-6. IEEE, 2018.
Published
2022-11-09
How to Cite
AL-ATROSHI , C. (2022). LINGUISTIC ANALYSIS ON CURSIVE CHARACTERS. Journal of Duhok University, 25(2), 33-40. https://doi.org/10.26682/sjuod.2022.25.2.3
Section
Pure and Engineering Sciences