To access this work you must either be on the Smith College campus OR have valid Smith login credentials.

On Campus users: To access this work if you are on campus please Select the Download button.

Off Campus users: To access this work from off campus, please select the Off-Campus button and enter your Smith username and password when prompted.

Non-Smith users: You may request this item through Interlibrary Loan at your own library.

Publication Date

2025-5

First Advisor

Kaitlyn Cook

Second Advisor

Luce Ward

Document Type

Honors Project

Degree Name

Bachelor of Arts

Department

Statistical and Data Sciences

Keywords

uncertainty, evolution, biology, prokaryotes, data science, Bayesian, hidden Markov models, bioinformatics, computational biology, phylogenetics, prediction, long-read

Abstract

Using 16S rRNA gene sequencing data is a fast, inexpensive method of performing taxonomic classification on prokaryotes. Many statistical methods have been developed to do such classifications. However, no existing tools for 16S analysis are geared towards long-read data. Long-read data is becoming increasingly accessible, and makes getting full-length 16S gene data significantly more feasible. We explore statistical methods in taxonomic assignment towards the development of a 16S analysis pipeline focused on long-read, full-gene data. We focus on the RDP Classifier, Bayesian Lowest Common Ancestor (BLCA), and Hidden Markov Model-based Utra-Fast OTU tools and assess their effectiveness on a testing set of Proteobacteria. We find that, when using continuous assignment, BLCA performs very similarly to RDP Classifier when using either bootstrap confidence scores or posterior probabilities to perform assignment. We conclude that BLCA’s novel Bayesian method shows great promise for growth and potential inclusion in a long-read, full-gene 16S analysis pipeline.

Rights

©2025 Elm Markert. Access limited to the Smith College community and other researchers while on campus. Smith College community members also may access from off-campus using a Smith College log-in. Other off-campus researchers may request a copy through Interlibrary Loan for personal use.

Language

English

Comments

x, 77 pages : illustrations (some color), charts. Includes bibliographical references (pages 73-77).

Share

COinS