Document Type

Article

Publication Date

8-29-2017

Publication Title

Journal of Statistics Education

Abstract

We present a data set consisting of user profile data for 59,946 San Francisco OkCupid users (a free online dating website) from June 2012. The data set includes typical user information, lifestyle variables, and text responses to 10 essay questions. We present four example analyses suitable for use in undergraduate introductory probability and statistics and data science courses that use R. The statistical and data science concepts covered include basic data visualization, exploratory data analysis, multivariate relationships, text analysis, and logistic regression for prediction.

Keywords

OkCupid, Online dating, Data science, Big data, Logistic regression, Text mining

Volume

23

Issue

2

DOI

doi.org/10.1080/10691898.2015.11889737

Rights

Copyright 2015 Albert Y. Kim and Adriana Escobedo-Land

Comments

Archived as published.

Included in

Mathematics Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.