Analyze GitHub public timeline & provide insights.
Tech stack includes Python, Node.js, MongoDB & Neo4j
- Fetch and parse public GitHub activity from GitHub Archive. Event type
PushEvent
are parsed using Node.js and inserted into MongoDB - Nodes and relations are built using Cypher query language and inserted into Neo4j for insights and recommendations
- Application is developed in Python using Flask framework
Step1: Set environment variables
deployEnv="" #production or development
PORT=5000 #non-standard port
#Heroku specific for Python & node.js
BUILDPACK_URL="https://github.com/ayyar/heroku-buildpack-python-nodejs"
#Production specific environment variables
connectURL="" #mongo connect URL
connectURLRead="" #mongo connect URL for readonly account
database="" #database name
mycollection="" #collection name
#Development specific environment variables
connectURLdev="" #mongo connect URL
databasedev="" #database name
mycollectiondev="" #collection name
myIP="" #development server IP address
#Neo4j specific environment variable
neoURL="" #neo4j connection string
Step 2: Get GitHub Archive public activity for the past hour
$> node FetchParseGitHubArchive.js //Add this script to a scheduler
Step 3: Start Flask
$> python RunFlask.py
# Procfile used for Heroku deployment
Step 4: Visit localhost:5000
Step 5: Integration with Neo4j (optional)
$>cd bin #bin folder inside repository
$>python GenerateCypher.py #build nodes and relations
$>cd /local/neo4j/bin #neo4j location
$>./neo4j-shell -file /path-to-cypher #import cypher from neo4j shell
$>cd bin #bin folder inside repository
$>python MongoInsert.py #insert recommendations inside mongo
- Ask GitHub GraphGist won the 2015 Neo4j Data Challenge in the category Creative Graph Search and Insights
- branch "datachallenge" contains the code branch for GitHub third annual data challenge