Publication Date

2014

Document Type

Honors Thesis

Department

Computer Science

Keywords

Optical character recognition devices, Syriac language-Writing-Data processing, Syriac, Character recognition, Machine learning, Support vector machines, Paleography

Abstract

A method for recognition and classification of characters in handwritten Syriac text are presented, along with several measures of their accuracy and related analysis. Our ultimate aim is to provide scholars of Syriac texts and the Syriac language with an academic tool to aid with digitization and analysis of scanned Syriac texts, although this work focuses specifically on the task of identifying characters within an image. Although there exist a variety of implementations with similar goals for modern languages, especially those that use Latin characters, such as English, few extensions of this approach have been applied to Syriac. The goal of this work is to use support vector machines, a form of machine learning, to train a classifier capable of correctly identifying detected candidate characters as letters of the Syriac alphabet with high accuracy. The feature sets used to represent each letter are histograms of oriented gradients taken from binarizations of the scanned documents, and a flexible partstructured character model is used to detect possible locations of each letter. Several training procedures are tested as training examples for the support vector machines, and results show that the most effective training set uses model letters as positive training examples and nonoverlapping detections as negative training examples, with the SVM classifiers trained in this manner correctly classifying up to 84% of the candidate characters, with varying results per letter and according to training procedure.

Language

English

Comments

35 pages : illustrations. Honors Project-Smith College, 2014. Includes bibliographical references (pages 34-35)

Share

COinS