The boardgamegeek friend finder and social graph visualization web2py app.
This is a simple app built with the web2py framework and hosted on pythonanywhere.com. The app accepts a username from the boardgamegeek.com (BGG) community, and produces a d3.js visualization of that users GeekBuddy social graph. The visualization is tailored to the purpose of helping users discover more GeekBuddies -in particular, second-degree connections (potential friends to add) are color coded according to a similarity metric based on ratings of boardgames in the BGG database. Data for the similarity computation was acquired via scraping and API calls to the BGG server. Note that the data lives in a sqlite database that is too large to be moved to the pythonanywhere servers in one piece (for us lowly "free tier" users). Therefore, there is a script that chunks my database on my PC and uploads the chunks to a seperate github repo. On the pythonanywhere servers I then pull the chunks down and reconstruct them with a script. The database repo is here.
A more detailed write up about the data acquisition, the similarity metric algorithm and the setup and workflow (web2py + pythonanywhere) can be had at:
- data acquisition with scraping and API calls
- similarity metric algorithm from boardgame ratings
- web2py and pythonanywhere setup and workflow
###Known Bugs:
- Fails when user is not in database (in compute_correlations, tries to drop row with "user" but row doesn't exist)
- Doesn't account for people that have you in their buddy list
###To-Do List:
- Redesign: Make it "build out" as far as needed to provide 200 (?) high quality nodes.
- Reconsider allowing multiple links to a single node (i.e. if a 2nd degree buddy is already on the graph, don't add a link to another first degree buddy. D
- Colorbar for similarity metric
- Clean up controller code (make one large dataframe of all links and operate on that and use it to generate the nodes and links text for .json)
###Future Redesign Outline Retain the logic where if a user has more than 10 buddies, 10 of those buddies are randomly selected.
Create a dataframe of links where first column is originator and second column is their buddy. Third column is integer for whether originator is the user or a first degree buddy. Fourth and fifth columns are the number of games in common and the pearson correlation for those two users.
After this dataframe is constructed we should be able to do some basic filtering like restricting the number of second degree buddies to 30 per first degree buddy. Finally, the .json for the nodes and links of the graph should be able to be generated by formatting the dataframe.
###Future Correlation Logic Terminology: user is the center of the graph, candidate refers to every other node.
If the user has rated less than 8 games, all correlations with candidates are 0. If the user has less than 8 games in common with a candidate, the correlation is 0. If a candidate is not in the database of ratings they are assumed to have no games in common with the user. If the user is not in the database of ratings then their ratings are retrieved from the BGG server (but they probably have less than 0 rated games).
###Future Node and Link Logic If the user has more than 10 buddies, 10 of those buddies are randomly selected for the graph. If