/crawlers

A bunch of crawlers for extracting data from various sites (site name is mentioned for each one)

Primary LanguagePythonThe UnlicenseUnlicense

License Status Commits repo size

A Set of Crawlers

Each crawler is built as part of another project. Different crawler techs are used:

  • Selenium
  • BeautifulSoup
  • Scrapy
  • Scholarly

Other possible crawlers that may speed up code flow (not used yet):

  • serpAPI
  • Octoparse

Data Collections

Canadian Top University Researchers Data

license download doi

Dataset consists of 32,240 records of Google Scholar profiles from researchers affiliated with top 20 universities in Canada. Columns are GUID, full name, list of research interests, university name, and number of total citations per researcher.