Document Type
Article
Publication Date
8-29-2017
Publication Title
Journal of Statistics Education
Abstract
We present a data set consisting of user profile data for 59,946 San Francisco OkCupid users (a free online dating website) from June 2012. The data set includes typical user information, lifestyle variables, and text responses to 10 essay questions. We present four example analyses suitable for use in undergraduate introductory probability and statistics and data science courses that use R. The statistical and data science concepts covered include basic data visualization, exploratory data analysis, multivariate relationships, text analysis, and logistic regression for prediction.
Keywords
OkCupid, Online dating, Data science, Big data, Logistic regression, Text mining
Volume
23
Issue
2
DOI
doi.org/10.1080/10691898.2015.11889737
Rights
Copyright 2015 Albert Y. Kim and Adriana Escobedo-Land
Recommended Citation
Kim, Albert Y. and Escobedo-Land, Adriana, "OkCupid Data for Introductory Statistics and Data Science Courses" (2017). Mathematics Sciences: Faculty Publications, Smith College, Northampton, MA.
https://scholarworks.smith.edu/mth_facpubs/45
Comments
Archived as published.