/CSPublicationCrawler

Computer science scholarly data crawler

Primary LanguageTSQL

Computer Science Scholarly Data Crawler

Introduction

This is the crawler used to crawl metadata of over 4 million computer science papers from Microsoft Academic Search (MAS), finished in November 2012. The crawling process was for research purpose and in agreement with the crawling protocols and policies at the time.

From the crawled metadata, we also built several datasets for scientific paper recommendation experiments based on a novel method to build ground truth data.
Dataset 1: https://drive.google.com/file/d/0B8gXe63FdGk5WTljdHdsSUw3UEk (.zip, 376 MB)
Dataset 2: https://drive.google.com/file/d/0B8gXe63FdGk5QlNsQmhVekx1SlU (.zip, 379 MB)
Dataset 3: https://drive.google.com/file/d/0B8gXe63FdGk5ZjlTWS1hZ0w0Tnc (.zip, 376 MB)
Dataset 4: https://drive.google.com/file/d/0B8gXe63FdGk5Q0pfNE1oNUVJZVU (.zip, 528 MB)

License

CSPublicationCrawler is a free software under MIT License.

The dataset is provided under open ODC-BY License 1.0.

Corresponding paper:
If you find the codes or data useful, please cite the following paper.
Hung Nghiep Tran, Tin Huynh, Kiem Hoang. A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation. KSE 2015.

For more information, please visit the website: https://sites.google.com/site/tranhungnghiep