Publication Date

2020

First Advisor

Katherine T. Halvorsen

Document Type

Honors Project

Degree Name

Bachelor of Arts

Department

Statistical and Data Sciences

Keywords

Metabolomic analysis, Machine learning, Prediction modeling, Lasso

Abstract

Metabolites are small biological molecules that are involved in the process of con- verting food to energy and in generating new cells. Metabolomics shows us unique features of cancer studies that genomics cannot provide. Current metabolomic re- search is limited by the number of metabolites that a study measures. Our goal is to predict unidentified metabolites. We used data from eight studies across six different cancer types: renal cell carcinoma, breast cancer, urthle cell carcinoma of the thyroid, diffuse large B-cell lymphoma, pancreatic cancer, and prostate cancer. We built prediction models using two methods, Principle Component Regression (PCR) and Least Absolute Shrinkage and Selection Operator (Lasso). We evaluated model performance on existing data and we achieved robust model performance. Prediction models for a portion of metabolites exhibit successful transfer learning on metabolites from an unseen cancer type or study.

Rights

©2020 Ziwei Zang. Access limited to the Smith College community and other researchers while on campus. Smith College community members also may access from off-campus using a Smith College log-in. Other off-campus researchers may request a copy through Interlibrary Loan for personal use.

Language

English

Comments

58 pages : illustrations (some color) Includes bibliographical references.

Share

COinS