Wencong Li

Publication Date


Document Type

Honors Project

Degree Name

Bachelor of Arts


Statistical and Data Sciences


Benjamin Baumer


Medium data, Databases, SQL, Public transportation, ETL, Reproducibility, Taxi, Uber, Lyft, R, Taxicabs-New York (State)-New York-Statistics, Transportation-New York (State)-New York-Statistics, SQL (Computer program language), R (Computer program language, MySQL (Electronic resource), Reproducible research


Yellow Taxi Cab is widely recognized as an important part of New York City. Each taxi trip record is like a little piece of a gigantic puzzle, and all together they tell a story of what happens in New York City everyday. Since New York City taxi data is too big to be used for data analysis in R, we need a tool that helps users to answer questions of NYC Street-Hail Services in R. This thesis presents an ecient and easy-to-use way to retrieve trip information of both taxi and other ride-sharing services, such as Uber and Lyft, in New York City. By analyzing trip records of New York City’s yellow cab, we answer questions that are commonly asked by taxi drivers, passengers, and New York City Taxi & Limousine Commission (TLC) ocials to help all three parties to improve their services or experiences.


2018 Wencong Li. Access limited to the Smith College community and other researchers while on campus. Smith College community members also may access from off-campus using a Smith College log-in. Other off-campus researchers may request a copy through Interlibrary Loan for personal use




101 pages : color illustrations, color maps. Includes bibliographical references (pages 97-101)