To access this work you must either be on the Smith College campus OR have valid Smith login credentials.
On Campus users: To access this work if you are on campus please Select the Download button.
Off Campus users: To access this work from off campus, please select the Off-Campus button and enter your Smith username and password when prompted.
Non-Smith users: You may request this item through Interlibrary Loan at your own library.
Publication Date
2014
Document Type
Honors Project
Department
Computer Science
Keywords
Optical character recognition devices, Syriac language-Writing-Data processing, Syriac, Character recognition, Machine learning, Support vector machines, Paleography
Abstract
A method for recognition and classification of characters in handwritten Syriac text are presented, along with several measures of their accuracy and related analysis. Our ultimate aim is to provide scholars of Syriac texts and the Syriac language with an academic tool to aid with digitization and analysis of scanned Syriac texts, although this work focuses specifically on the task of identifying characters within an image. Although there exist a variety of implementations with similar goals for modern languages, especially those that use Latin characters, such as English, few extensions of this approach have been applied to Syriac. The goal of this work is to use support vector machines, a form of machine learning, to train a classifier capable of correctly identifying detected candidate characters as letters of the Syriac alphabet with high accuracy. The feature sets used to represent each letter are histograms of oriented gradients taken from binarizations of the scanned documents, and a flexible partstructured character model is used to detect possible locations of each letter. Several training procedures are tested as training examples for the support vector machines, and results show that the most effective training set uses model letters as positive training examples and nonoverlapping detections as negative training examples, with the SVM classifiers trained in this manner correctly classifying up to 84% of the candidate characters, with varying results per letter and according to training procedure.
Language
English
Recommended Citation
Henderson, Zella Harriet, "Unsegmented character recognition on handwritten Syriac documents" (2014). Honors Project, Smith College, Northampton, MA.
https://scholarworks.smith.edu/theses/57
Smith Only:
Off Campus Download
Comments
35 pages : illustrations. Honors Project-Smith College, 2014. Includes bibliographical references (pages 34-35)