/OKCupidDataAnalysis

The data of the OkCupid application was processed with machine learning and text mining techniques, and users' profiles were analyzed. In this way, it was aimed to match users with similar characteristics to each other.

Primary LanguageJupyter NotebookMIT LicenseMIT

OKCupidDataAnalysis

The data of the OkCupid application was processed with machine learning and text mining techniques, and users' profiles were analyzed. In this way, it was aimed to match users with similar characteristics to each other. The overall objective of this study is to show how user data can help both the user and the developer of online dating platforms. User data becomes valuable over time due to its easy accessibility and storability. Not only can we take the data at face value, but we can also do further analysis by finding relationships and outliers. The benefit of finding relationships in the data is that it can help users find matching relationships with other users. One benefit of finding outliers is that with the possible addition of recommendation systems, it can help the developer filter out spam or low-quality accounts. Finding inconsistencies or missing values in the data can help both the user and the developer in figuring out what should and should not be reported. Something as simple as skipping data due to privacy concerns, such as income or education, can be addressed after finding the effect described in this study. Something that has already been addressed by OkCupid lies in the user-selected ”Educational" response. Instead of requesting a classic level of education, such as ”Graduated from High School“ or ”Dropped Out of College,“ users can choose a tongue-in-cheek response, such as ”Work in Space Camp." This benefits the overall system by allowing users to have a full profile and developer with less missing values to work with. After analyzing the data, we use a short machine learning algorithm to create a model that predicts the gender of the user. While predicting gender is useful for some users, but not for others, it can be easily replicated with other combinations of variables to create other models. Other models may include: Education, body type and age to estimate income level. Also, drinking and drug habits to predict the type of work. The models we create for forecasting can primarily help the developer match users with recommendations. It can also help to fill in missing user profiles, which helps users to get a complete profile.