OHSCR: Benchmarks Dataset for Offline Handwritten Sindhi Character Recognition
Keywords:
Benchmark Dataset, Handwritten Character Recognition, Pattern Recognition, Machine Learning, Sindhi LanguageAbstract
This research work presents a unique dataset for offline handwritten Sindhi character recognition. It has 7800 character images in total, divided into multiple categories by 150 writers of various ages, genders, and professional backgrounds. Each writer writes the 52 Sindhi characters in the designed form. With a high-quality scanner, all of the written samples were scanned. After that, all the handwritten Sindhi characters were cropped from the collected designed form, and the cropped images were saved in ‘.png’ format. For the benefit of the Sindhi research community, this work suggests an image dataset for character recognition in handwritten Sindhi. The dataset will be made
publically available. For the Sindhi language, this dataset can be used to create and test handwritten character recognition systems and provide helpful insights through writer identification. The dataset has been divided into the training set and the test set, with 80% for training and 20% for testing. The different preprocessing techniques used to remove noise from scanned images to create a clean dataset. The dataset created as a result of this research is the world's first openly accessible dataset for handwritten research, and it can be useful for writer identification systems and handwriting recognition systems.
References
Saqib, N., Haque, K. F., Yanambaka, V. P., & Abdelgawad, A. (2022). Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach With Massive Multisource
Data. Algorithms, 15(4), 129.
Hamdan, Y. B., & Sathesh, A. (2021). Construction of Statistical SVM-based Recognition Model For Handwritten Character Recognition. Journal of Information Technology and Digital World, 3(2), 92-107.
Ghosh, T., Abedin, M. H. Z., Al Banna, H., Mumenin, N., & Abu Yousuf, M. (2021). Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognition and Image Analysis, 31, 60-71.
Ahlawat, S., Choudhary, A., Nayyar, A., Singh, S., & Yoon, B. (2020). Improved handwritten digit recognition using convolutional neural networks (CNN). Sensors, 20(12), 3344.
Naz, S., Umar, A. I., Shirazi, S. H., Ahmed, S. B., Razzak, M. I., & Siddiqi, I. (2016). Segmentation techniques for recognition of Arabic-like scripts: A comprehensive survey. Education and Information Technologies, 21, 1225-1241.
Husnain, M., Saad Missen, M. M., Mumtaz, S., Jhanidr, M. Z., Coustaty, M., Muzzamil Luqman, M., & Sang Choi, G. (2019).Recognition of Urdu handwritten characters using convolutional
neural network. Applied Sciences, 9(13), 2758.
Hakro, D. N., Ismaili, I. A., Talib, A. Z., Bhatti, Z., & Mojai, G. N. (2014). Issues and challenges in Sindhi OCR. Sindh University Research Journal (Science Series), 46(2), 143-152.
Bhatti, Z., Ismaili, I. A., Soomro, W. J., & Hakro, D. N. (2014). Word segmentation model for Sindhi text. American Journal of Computing Research Repository, 2(1), 1-7.
Liwicki, M., & Bunke, H. (2005, August). IAM-OnDB-an on-line English sentence database acquired from handwritten text on a whiteboard. In Eighth International Conference on Document Analysis and Recognition (ICDAR'05) (pp. 956-961). IEEE.
Wilkinson, R. A., Geist, J., Janet, S., Grother, P. J., Burges, C. J., Creecy, R. & Wilson, C. L. (1992). The first census optical character recognition system conference (Vol. 184). US Department of Commerce, National Institute of Standards and Technology.
Kavallieratou, E., Fakotakis, N., & Kokkinakis, G. (2002, August). Handwritten character recognition based on structural characteristics. In 2002 International Conference on Pattern
Recognition (Vol. 3, pp. 139-142). IEEE.
Srihari, S. N., Cha, S. H., Arora, H., & Lee, S. (2002). Individuality of Handwriting. Journal of Forensic Sciences, 47(4), 856-872.
Hull, J. J. (1994). A Database for Handwritten Text Recognition Research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5), 550-554.
Cheriet, M., Thibault, R., & Sabourin, R. (1994, November). A Multi-Resolution Based Approach for Handwriting Segmentation in Gray-Scale Images. In Proceedings of 1st International
Conference on Image Processing (Vol. 1, pp. 159-163). IEEE.
Viard-Gaudin, C., Lallican, P. M., Knerr, S., & Binter, P. (1999, September). The Ireste On/Off (Ironoff) Dual Handwriting Database. In Proceedings of the Fifth International Conference on
Document Analysis and Recognition. ICDAR'99 (Cat. No. PR00318) (pp. 455-458). IEEE.
Al Maadeed, S., Ayouby, W., Hassaine, A., & Aljaam, J. M. (2012, September). QUWI: An Arabic and English Handwriting Dataset for Offline Writer Identification. In 2012 International Conference on Frontiers in Handwriting Recognition (pp. 746-751). IEEE.
Zhang, H., Guo, J., Chen, G., & Li, C. (2009, July). HCL2000-A Large-Scale Handwritten Chinese Character Database for Handwritten Character Recognition. In 2009 10th International Conference on Document Analysis and Recognition (pp. 286-290). IEEE.
Kavallieratou, E., Liolios, N., Koutsogeorgos, E., Fakotakis, N., & Kokkinakis, G. (2001, September). The GRUHD Database of Greek Unconstrained Handwriting. In Proceedings of Sixth International
Conference on Document Analysis and Recognition (pp. 561-565). IEEE.
Elanwar, R. I., Rashwan, M. A., & Mashali, S. A. (2010). OHASD: The First On-Line Arabic Sentence Database Handwritten on Tablet PC. International Journal of Computer and Information Engineering, 4(12), 1907-1912.
Huda, A., Sadri, J., Suen, C. Y., & Nobile, N. (2008). A Novel Comprehensive Database for Arabic Off-Line Handwriting Recognition. In Proceedings of 11th International Conference on Frontiers in Handwriting Recognition, ICFHR (Vol. 8, pp. 664-669).
Hussain, R., Raza, A., Siddiqi, I., Khurshid, K., & Djeddi, C. (2015). A Comprehensive Survey of Handwritten Document Benchmarks: Structure, Usage and Evaluation. EURASIP Journal on Image and Video Processing, 2015(1), 1-24.
Mathworks. (n.d.). rgb2gray : Convert RGB Image or Colormap to Grayscale - MATLAB. Retrieved from: https://www.mathworks.com/help/matlab/ref/rgb2gray.html.
Northwestern University. (n.d.). im2bw : Image Processing Toolbox. Retrieved from: http://www.ece.northwestern.edu/localapps/matlabhelp/toolbox/images/im2bw.html
Izmiran.ru. (n.d). Imdilate : Image Processing Toolbox User’s Guide. Retrieved from:
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2024 Jakhro Abdul Naveed, Mudasar Ahmed Soomro , Leezna Saleem, Muhammad Khalid Shaikh (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.