/SocialCrawler

Primary LanguagePHPApache License 2.0Apache-2.0

Social Packets crawler Build Status DOI

S. Felix Wu, wu@cs.ucdavis.edu
Fredrik Erlandsson, fredrik.erlandsson@bth.se

This crawler consists of two parts, the agent.php that does the actual crawling and a controller (found in contoller/) keeping track of the current crawling status.


Install

The agent is dependent on the Facebook PHP SDK. To install just do a submodule update:

git submodule update --init


Configuration

Most of the time you only need to use the agent.

Create a Facebook application at: https://developers.facebook.com/apps, make sure to fill in offline_access & read_stream under Permissions->Extended Permissions.

Copy config/config-dist.php to config/config.php and fill APPID, APPSEC (from your Facebook application page) & the URL to a running controller.

Usage

run php agent.php token=FACEBOOK_USER_TOKEN
or as a web application http://example.com/agent.php?token=FACEBOOK_USER_TOKEN

To run multiple instances (reccomended) of the agent in one environment use the script bgxgrp.sh as:
bash bgxgrp.sh <#-instances> php agent.php token=FACEBOOK_USER_TOKEN
where <#-instances> should be replaced with the number of threads to run (something between 8-15 is reasonable to not hit Facebook's 600/600 limit).

The FACEBOOK_USER_TOKEN is generated via the graph explorer page https://developers.facebook.com/tools/explorer/ using an user that is said to be over 18 of age to support crawling of all types of pages.

Happy crawling!!