Journal of Statistics Education
We present a data set consisting of user profile data for 59,946 San Francisco OkCupid users (a free online dating website) from June 2012. The data set includes typical user information, lifestyle variables, and text responses to 10 essay questions. We present four example analyses suitable for use in undergraduate introductory probability and statistics and data science courses that use R. The statistical and data science concepts covered include basic data visualization, exploratory data analysis, multivariate relationships, text analysis, and logistic regression for prediction.
OkCupid, Online dating, Data science, Big data, Logistic regression, Text mining
Copyright 2015 Albert Y. Kim and Adriana Escobedo-Land
Kim, Albert Y. and Escobedo-Land, Adriana, "OkCupid Data for Introductory Statistics and Data Science Courses" (2017). Mathematics and Statistics: Faculty Publications, Smith College, Northampton, MA.