This module is a python search engine to recover public information on a person from a number of sources.
In order to use it, it is required to register a Twitter API key, a LinkedIn account and a New York Times API key (see code for more information).
Steps:
- query Forbes and crawl to get info (selenium)
- query Wikipedia API, flag if present, scrape info and get summary if present
- query LinkedIn API, get profession and past experiences
- query Twitter, number of followers, whether it's a verified account
- crawl Google search, news and a number of news sites (Financial times, The economist, bilan.ch, challenges.fr)
- query New York Times API
- build a model to predict if person is famous/politically exposed mainly from online presence
- apply model and record probability
Further improvements:
- see other APIs
- create dictionary to convert country name in 3-letter country code
- create a model to estimate wealth
- create an option to return wealth in different units
See notebook/search engine.ipynb
for more details. The current workflow is the following:
- Launch web driver with window to control behavior:
driver = data_acquisition.launch_browser_driver(headless=False)
- create person object (info will contain only firstname and lastname):
person = data_acquisition.Person('Jeff', 'Bezos', middlename='Preston', driver=driver)
person.print_info()
- get info sequentially
person.get_info_from_Forbes()
person.get_info_from_Wikipedia()
person.get_info_from_LinkedIn()
person.get_info_from_Twitter()
person.get_info_from_Google()
person.get_info_from_nytimes()
- print results
person.print_info()
- run famous people model
reload(data_modeling)
data_modeling.predict_PEP(person)
- print final information
person.print_info()
Below is a list of additional sources of information:
- Forbes (use selenium and Chrome headless to crawl the Forbes website).
- another source of wealth information
- build a correlation Company wealth/CEO wealth (get CEO info from LinkedIn)
- youTube/Facebook/Instagram stars
- Swiss public employees
- Glassdoor salary from LinkedIn profession
- Panama papers
- Politicians: public tax declararion in France, Switzerland, US
- actors/IMBD artist's fee
Features:
wikipedia_presence
Google_search_nresults
Google_news_nresults
Financial_news_nresults
nytimes_nresults
Sources:
- Wikipedia: Use Wikipedia API with Python package.
- Twitter: number of followers
- startpage: number of results
- Financial news: site:bilan.ch OR site:challenges.fr OR site:forbes.com OR site:ft.com OR site:economist.com
- news website (BBC, New York Times)
- blogs
- CIA worldfactbook
- LinkedIn (via Google?), profession, experience, number of followers
- Google, Wikipedia
- white pages API