To access this work you must either be on the Smith College campus OR have valid Smith login credentials.

On Campus users: To access this work if you are on campus please Select the Download button.

Off Campus users: To access this work from off campus, please select the Off-Campus button and enter your Smith username and password when prompted.

Non-Smith users: You may request this item through Interlibrary Loan at your own library.

Publication Date


First Advisor

Katherine T. Halvorsen

Document Type

Honors Project

Degree Name

Bachelor of Arts


Statistical and Data Sciences


Metabolomic analysis, Machine learning, Prediction modeling, Lasso


Metabolites are small biological molecules that are involved in the process of con- verting food to energy and in generating new cells. Metabolomics shows us unique features of cancer studies that genomics cannot provide. Current metabolomic re- search is limited by the number of metabolites that a study measures. Our goal is to predict unidentified metabolites. We used data from eight studies across six different cancer types: renal cell carcinoma, breast cancer, urthle cell carcinoma of the thyroid, diffuse large B-cell lymphoma, pancreatic cancer, and prostate cancer. We built prediction models using two methods, Principle Component Regression (PCR) and Least Absolute Shrinkage and Selection Operator (Lasso). We evaluated model performance on existing data and we achieved robust model performance. Prediction models for a portion of metabolites exhibit successful transfer learning on metabolites from an unseen cancer type or study.


©2020 Ziwei Zang. Access limited to the Smith College community and other researchers while on campus. Smith College community members also may access from off-campus using a Smith College log-in. Other off-campus researchers may request a copy through Interlibrary Loan for personal use.




58 pages : illustrations (some color) Includes bibliographical references.