a simple supervised learning algorithm to classify PATSTAT records into two categories:
- person names
- not person names
psClassify_pre.py
extracts data and prepares for model fitting
psClassify_R.r
fits the model and saves to .csv
Assigns a probability that a name in the Patstat database belongs to a person and not to an entity that is not a person (eg. company, university)
Python