In the spirit of Valentine’s Day, let’s explore a fun little Relationship App quiz that forecasts how long your relationship will last. Data from a Stanford University study, How Couples Meet and Stay Together (Rosenfeld, Michael J., Reuben J. Thomas, and Maja Falcon. 2018), was prepared and used to train DataRobot machine learning models. Then a small six question quiz was created after evaluating numerous variables and final outcomes from thousands of couples. Go ahead and give it a try.
Keep in mind that while the Relationships App was based on scientific studies, out-of-sample validation, systematically tuned machine learning models, and dozens of benchmarks with competing approaches.
Don’t take your results too seriously.
My husband and I both took the quiz and were pleased with our results. We scored a 98.6% probability of staying together for two more years. We’ve been together for more than 20 years already. It does seem to get easier for us over time. During our marriage, military lifestyle was our biggest challenge that tested us early on. Military lifestyle was not one of the six Relationship App quiz questions.
The Relationship App factors may or may not be causal. Predictions are based on the available Stanford University study data and six data scientist selected questions. Notably six questions are not comprehensive of relationship strengths or challenges.
Relationship App Data Source
The How Couples Meet and Stay Together study was designed to provide answers to the following research questions:
- Do traditional couples and nontraditional couples meet in the same way? What kinds of couples are more likely to have met online?
- Have the most recent marriage cohorts (especially the traditional heterosexual same-race married couples) met in the same way their parents and grandparents did?
- Does meeting online lead to greater or less couple stability?
- How do the couple dissolution rates of nontraditional couples compare to the couple dissolution rates of more traditional same-race heterosexual couples?
- How does the availability of civil union, domestic partnership or same-sex marriage rights affect couple stability for same-sex couples? This study will provide the first nationally representative data on the couple dissolution rates of same-sex couples.
The 4,002 study participants were adults from the United States of America. 3,009 study participants had a spouse or main romantic partner. Researchers oversampled self-identified gay, lesbian, and bisexual adults. Follow-up surveys were implemented one and two years after the main survey to measure couple dissolution rates. Additional follow-up surveys are also available if you are interested in the data.
Preparing the Data
To get research study data machine learning ready, a data scientist reviewed what data was collected and why. Then they shaped it for analysis in Python.
The structure of the study involved several waves of surveys for participants. The questionnaires asked (mostly) the same questions multiple times. The survey data was set up horizontally with each row of data corresponding to an individual couple. Each questionnaire was then appended as additional columns to the dataset; e.g., the question about housing type was denoted as five separate columns (one for each questionnaire) as PPHOUSE, PP2_PPHOUSE, PP3_HOUSE, PP4_HOUSE, and PP5_HOUSE. Naming conventions were not 100% consistent. A fair bit of data prep work had to be done to get all the columns correctly parsed.
Like many machine learning projects, the dataset needed to be flattened to one row per couple for lifetime of survey history with a target outcome. To get data into that format, raw survey data needed be pivoted while systematically identifying repeat questions.
The main interest for Relationship App project was to make predictions about relationship strength. Repeated survey results for a particular couple indicated whether or not the couple was still together. The surveys were taken around one year apart – not precisely a year apart. In cases where a couple ends their relationship, it was easy to calculate roughly how long they were together.
For couples that stayed together, how long that relationship would last was not observable until the relationship ends. There are methods for handling this issue (censoring). The data scientist didn’t go down that path for two reasons. First, he said those approaches tend to require many assumptions. Second, he mentioned a large proportion of the couples in the dataset stayed together.
For predicting relationship future outlook, he used a binary prediction target that was designed to indicate probability of a couple staying together for two years from a given date. In cases where no surveys were taken two years later, he dropped observations. If you want to see his Python code, check out the Relationship App Github project.
Exploring Data with Machine Learning
The modeling dataset was small with a little over 3,100 observations and 110 features. As a result, the best machine learning models were Generalized Linear Models (GLM). The top model out of DataRobot used a Ridit transform to standardize variables, compute median imputation, and fit an elastic net in order to perform regularization. DataRobot ran a grid search over the tuning parameters and ended up with several pretty strong models (out-of-sample AUC=0.897).
To reduce machine learning model complexity and narrow down how many features to use in a survey, the data scientist looked at the most important ones. Essentially, he wanted to find a small number of interesting features that would produce a good result. Here are the relative importance for the top 40 features.
Building the Quiz
To build an enjoyable short quiz with an acceptable level of prediction accuracy while also avoiding controversy or sensitive questions, the data scientist crafted six questions. The questions were designed to capture five to ten machine model features.
The survey questions ask about relationship status, how long a couple has been together, the age of each person in the couple, the highest level of education completed, number and ages of children, and how much couples interact with extended family. The answers to those questions were found to be the best predictors of how long your relationship will last.
Summarizing Key Findings
The final model ended up being a relatively straight forward logistic regression model with elastic net with median imputation and simple standardization. The standardized model coefficients for the model looked like this:
The Relationship App quiz converted all break-up probabilities and stay-together probabilities. A large coefficient indicated a higher chance of a break-up within the next two years; i.e., a lower chance of staying together.
What the data scientist learned:
- Non-casual relationships are robust. 94% of the people in the Stanford study that these models are based on stayed together. The folks involved in this study weren’t casually dating, but were instead involved in committed, monogamous relationships. Those tend to last.
- The more external acts of commitment that a couple makes, the more likely they are to stay together. The average prediction for a married couple in the holdout data was greater than 97%, while the average score for unmarried people was quite a bit lower,
- The data seems to support traditional sensibilities. For example, married couples that lived together before they got married are slightly more likely to break up than those that didn’t. This appeared to be true across all the models — even though this feature didn’t make it into final models.
- Big extended families with regular interaction appears to make relationships last longer.
- The data included counts of children in the household of certain ages. It turned out that having more children between 2 and 5 years old in the house is correlated with an increase in break-ups. This was not true for children of other ages. For parents with young children, hang in there.
For More Information
To learn more about why and how the Relationship app was designed, please read the original DataRobot series by Greg Michaelson .