- Debian like OS (tested on Ubuntu 15.10)
- Python 2.7
- virtualenvwrapper 4.5.1
- FB Graph API version 2.5
- Internet connection
$ source virtualenvwrapper.sh
$ mkvirtualenv fbcomments-test
$ git clone git@github.com:twil/fbcomments-test.git
$ cd fbcomments-test
./fbcomments-test$ workon fbcomments-test
(fbcomments-test)./fbcomments-test$ pip install -r requirements.txt
- FB requests are pretty simple - we can use
requests
library. - Cursor-based Pagination. Time-based and Offset-based pagination doesn't work with /comments edge! (https://developers.facebook.com/tools/explorer/)
- Batch Requests? might be we can fabricate paged URLs with
offset
andsince
? - Error procession. If response (JSON) has
error
property then request failed. - Rate limiting. App Level Throttling: 200 calls/person/hour (Error Code 4).
- Don't request unneeded fields (we need only
created_time
to calculate the frequency of comments). - Timeouts.
- Use
multiprocessing
- Use Google Charts
pandas
has a very neat way for calculating needed frequenciesSeries.resample('3Min', how='sum', label='right')
. But how to parallelize the whole thing?- The last index after
Series.resample()
of one page will be the same as the first one of the next page. In this case we need to sum these values to merge two sequences. If per chance the first index of the second page is different then we need to concatenate two sequences and that's it! - How to parallelize? We get pages from FB sequentially. We can only parallelize procession of received data:(
- FB docs are not so good -
order
on the/comments
edge can bechronological
andreverse_chronological
. This means we can "eat" comments from two sides in parallel!
So we have ~52k comments for the given post (10151775534413086) and 200 requests per hour per user and a limit of 5k comments in a single request (it might be less?).
We need 11 requests to procession the data.
52k comments in 5 minute buckets are 257k timestamps! We can drop NA values. That'll give ~4k values.
- Get an Access Token somehow (out of scope at this moment)
- Get all the comments timestamps using Cursor-based pagination with 10k limit and selecting only
created_time
field - Calculate the frequencies for 5 min intervals
- Create a report folder
- Save data
data.js
- Copy template
report.html
into the report folder
Tests are written in tests.py
. To run a test suite issue:
(fbcomments-test)./fbcomments-test$ nosetests
TODO:
Codes to wait and retry:
- 1 - API Unknown. Retry and forget if not successful.
- 2 - API Service.
- 4 - API Too Many Calls. Examine your API request volume?
- 17 - API User Too Many Calls. Examine your API request volume?
- 341 - Application limit reached. Examine your API request volume?
In Chrome Dev Tools (app is configured for test.domain
domain)
// test.domain
window.fbAsyncInit = function() {
FB.init({
appId : '645041415635369',
xfbml : true,
version : 'v2.5'
});
};
(function(d, s, id){
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) {return;}
js = d.createElement(s); js.id = id;
js.src = "//connect.facebook.net/en_US/sdk.js";
fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'facebook-jssdk'));
FB.login(function(){}, {scope: ''});
FB.getAuthResponse();
https://www.facebook.com/dialog/oauth?client_id=645041415635369&redirect_uri=https://www.facebook.com/connect/login_success.html&response_type=token
After confirmation of permissions you'll be redirected to a new URL with access_token
in it.
https://developers.facebook.com/docs/facebook-login/for-devices
Somehow with an App Secret or Client Token.
FB.api(
'/10151775534413086/comments',
'GET',
{"fields":"created_time","limit":"100000","pretty":"0","summary":"1","filter":"stream"},
function(response) {
console.log(response.data.length);
}
);
This will give us 5000