Theses, Dissertations, and Projects

To access this work you must either be on the Smith College campus OR have valid Smith login credentials.

On Campus users: To access this work if you are on campus please Select the Download button.

Off Campus users: To access this work from off campus, please select the Off-Campus button and enter your Smith username and password when prompted.

Non-Smith users: You may request this item through Interlibrary Loan at your own library.

Unsegmented character recognition on handwritten Syriac documents

Zella Harriet Henderson, Smith College

Publication Date

2014

Document Type

Honors Project

Department

Computer Science

Keywords

Optical character recognition devices, Syriac language-Writing-Data processing, Syriac, Character recognition, Machine learning, Support vector machines, Paleography

Abstract

A method for recognition and classification of characters in handwritten Syriac text are presented, along with several measures of their accuracy and related analysis. Our ultimate aim is to provide scholars of Syriac texts and the Syriac language with an academic tool to aid with digitization and analysis of scanned Syriac texts, although this work focuses specifically on the task of identifying characters within an image. Although there exist a variety of implementations with similar goals for modern languages, especially those that use Latin characters, such as English, few extensions of this approach have been applied to Syriac. The goal of this work is to use support vector machines, a form of machine learning, to train a classifier capable of correctly identifying detected candidate characters as letters of the Syriac alphabet with high accuracy. The feature sets used to represent each letter are histograms of oriented gradients taken from binarizations of the scanned documents, and a flexible partstructured character model is used to detect possible locations of each letter. Several training procedures are tested as training examples for the support vector machines, and results show that the most effective training set uses model letters as positive training examples and nonoverlapping detections as negative training examples, with the SVM classifiers trained in this manner correctly classifying up to 84% of the candidate characters, with varying results per letter and according to training procedure.

Language

English

Comments

35 pages : illustrations. Honors Project-Smith College, 2014. Includes bibliographical references (pages 34-35)

Recommended Citation

Henderson, Zella Harriet, "Unsegmented character recognition on handwritten Syriac documents" (2014). Honors Project, Smith College, Northampton, MA.
https://scholarworks.smith.edu/theses/57

Download

Smith Only:
Off Campus Download

COinS

Smith ScholarWorks

Theses, Dissertations, and Projects

Unsegmented character recognition on handwritten Syriac documents

Publication Date

Document Type

Department

Keywords

Abstract

Language

Comments

Recommended Citation

Search

Browse

Smith ScholarWorks

Theses, Dissertations, and Projects

Unsegmented character recognition on handwritten Syriac documents

Author

Publication Date

Document Type

Department

Keywords

Abstract

Language

Comments

Recommended Citation

Share

Search

Browse