JT Wolohan
This is a Python implementation of Sap et al.'s gender prediction algorithm for Twitter. The algorithm should be 90% accurate given a large sample of users and a reasonable amount of data for each user.
Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., ... & Schwartz, H. A. (2014). Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1146-1151).
- Clone the repository.
- Import SapGenderPrediction.
- Initiate a
GndrPrdct
class object. - Call the
predict_gender
method on a string collection of tweets.
Predictions are returned as integers. 0 is a prediction of male, 1 is a prediction of female.
# Step 2
from SapGenderPrediction import GndrPrdct
# Step 3
Classifier = GndrPrdct()
tweets = ["This is a tweet.", "I'm another tweet!", "Hey, @realDonaldTrump, I'm yet another tweet!"]
# Step 4
Classifier.predict_gender(" ".join(tweets))