AlexU-Word: A New Dataset for Isolated-Word Closed-Vocabulary Offline Arabic Handwriting Recognition

Abstract

In this paper, we introduce the first phase of a new dataset for offline Arabic handwriting recognition. The aim is to collect a very large dataset of isolated Arabic words that covers all letters of the alphabet in all possible shapes using a small number of simple words. The end goal is to collect a very large dataset of segmented letter images, which can be used to build and evaluate Arabic handwriting recognition systems that are based on segmented letter recognition. The current version of the dataset contains 25114 samples of 109 unique Arabic words that cover all possible shapes of all alphabet letters. The samples were collected from 907 writers. In its current form, the dataset can be used for the problem of closed-vocabulary word recognition. We evaluated a number of window-based descriptors and classifiers on this task and obtained an accuracy of 92.16\% using a SIFT-based descriptor and ANN.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…