A Stack Overflow data ingest tool written for IBM Cloud Functions/OpenWhisk.
It consists of the following OpenWhisk actions (all of them in the stackoverflow
package)
socron
is a sequence to be triggered periodically (we might need a few of these to cover all our tags) which containscollector
andinvoker
collector
makes the API call to StackOverflow. Thetags
parameter must be supplied (usually to the sequence) and theapikey
parameter will be used if present (a StackOverflow API key). It returns the "items" array of the result it gotinvoker
simply fires aqhandler
action for each of the elements in theitems
array it receivesqhandler
is a sequence containingstorer
andnotifier
storer
requires the parameterscloudantURL
anddbname
to be set (usually on the package). It writes every question to the database with thequestion_id
as the ID, updating the existing question record if we already have one. The question is in a field calledquestion
in the data object, and we also addstatus
andowner
. Status isnew
if we inserted it andupdated
if we updated it because the ID already existed.notifier
will notify slack for any data it gets that has a status of new
Quick Start run ./deploy.sh
and check that the cloudantURL
, slackURL
, and dbname
parameters are set on the stackoverflow
package (optional: also set the apikey
parameter to a valid StackOverflow API key). Then invoke stackoverflow/socron
with your desired tags
param. To set the setup configured rules so that the actions run periodically, run ./rules.sh
.
To deploy to IBM Cloud, there are TravisCI setup instructions on the wiki.