/hive

A platform for backing crowdsourcing websites, built in golang for elasticsearch

Primary LanguageGoApache License 2.0Apache-2.0

hive-logo-sm

hive

A platform for backing crowdsourcing websites, built in Go for Elasticsearch.

Setup

Hive requires elasticsearch version 1.3 or higher. Where you install it is up to you, as you can tell hive the domain and port for accessing elasticsearch at startup.

Installation on a Mac is simple with homebrew:

brew update
brew install elasticsearch

You can find instructions for other platforms in the elasticsearch guide.

There are two options for running hive.

Binary

Download the latest release, unzip and double click to start. hive will be running on http://localhost:8080 by default.

Source

This method will allow you to hack on the hive source code. You'll need go installed and a working environment for it ($GOPATH, etc).

mkdir -p $GOPATH/src/github.com/nytlabs
cd $GOPATH/src/github.com/nytlabs
git clone git@github.com:nytlabs/hive.git
cd hive
make

Finally, to start up hive with defaults:

./build/hive-server
2014/10/16 14:52:19 running hive-server on port 8080 storing data in elasticsearch under index hive

An example specifying all config params:

$ ./build/hive-server -index hive -esDomain localhost -esPort=9200 -port 8888
2014/10/16 14:51:54 running hive-server on port 8888 storing data in a local instance of elasticsearch under index hive

Forget what parameters are available? There's help:

$ ./build/hive-server -h

Usage of ./build/hive-server:
  -esDomain="localhost": elasticsearch domain
  -esPort="9200": elasticsearch port
  -index="hive": elasticsearch index name
  -port="8080": hive port

Importing Data

All of a project's information is defined in JSON and POST'd to hive at its admin setup endpoint. You can find a full example in this repo.

$ curl -XPOST localhost:8080/admin/setup -d@samples/example.json
2014/11/21 12:29:22 Created project: crowd
2014/11/21 12:29:22 task: 2
2014/11/21 12:29:22 assets: 4

Projects

A project is a single crowdsourcing app hosted in hive. Everything is scoped to a project, at the very least: assets, assignments, tasks and users.

Field Description
Id a unique identifier used as a slug in urls
Name a regular string title for the project
Description optional, additional information about the project
  "Project": {
    "Id": "crowd",
    "Name": "Crowd",
    "Description": "An example crowd sourcing site."
  }

Tasks

Tasks are individual actions to do on an asset. A project can have one or more tasks. Criteria for assignment and verification of assets is stored on a task.

Field Description
Name a regular string title for the task
Description optional additional information
CurrentState should the task be in the 'available' or 'waiting' state after importing
AssignmentCriteria the criteria used to assign assets for this task
CompletionCriteria the criteria used to mark an asset as 'completed' for this task: Total and Matching counts for submissions
  "Tasks": [
    {
      "Name": "categorize",
      "Description": "categorize images",
      "CurrentState": "available",
      "AssignmentCriteria": {
        "SubmittedData": {
          "categorize": {}
        }
      },
      "CompletionCriteria": {
        "Total": 50,
        "Matching": 50
      }
    }
   ]

Assets

Assets are what get assigned to users and can be images, pdfs, etc. All require a URL and are scoped to a project.

Field Description
Url required, where to find this asset
Name optional, a regular string title
Metadata optional, any additional data about this asset, specified as key-value pairs.
  "Assets": [
    {
      "Name": "Space 1",
      "Url": "http://upload.wikimedia.org/wikipedia/commons/8/84/Wormhole.png"
    }
   ]

Users

Users are the members of the crowd that you source in your app. They are scoped to a project, so the same person can have multiple records, one per project. Which fields are required is up to you - Hive will create a user with only an ID, to keep the barrier of entry low.

The current user is determined by a cookie named {project_id}_user_id, for example, crowd_user_id. This cookie should contain the id for the current user.

Create

POST /projects/{project_id}/user

Response

{
    "Id": "GorJ0TxVRbipE9SIJypEVQ",
    "Name": "Resourceful Person",
    "Email": "person@example.com",
    "Project": "crowd",
    "ExternalId": "",
    "Counts": {
        "Assignments": 10,
        "Favorites": 0,
        "crowd-categorize": 10,
        "crowd-vote": 0
    },
    "Favorites": {}
}

Your site should set the user_id cookie with the Id value returned in this response.

Get the current user

GET /projects/{project_id}/user

Cookie {project_id}_user_id

Response

{
    "Id": "GorJ0TxVRbipE9SIJypEVQ",
    "Name": "Resourceful Person",
    "Email": "person@example.com",
    "Project": "crowd",
    "ExternalId": "",
    "Counts": {
        "Assignments": 10,
        "Favorites": 0,
        "crowd-categorize": 10,
        "crowd-vote": 0
    },
    "Favorites": {}
}

Assignments

Assignments are the work users have to do for a given task and asset. A user cannot get the same assignment twice: assignments are scoped to the current project, task, asset and user.

Create an Assignment

GET /projects/{project_id}/tasks/{task_id}/assignments

Cookie {project_id}_user_id

Response

{
    "Id": "crowdHIVEcrowd-voteHIVExpZWabTwQFS94YgZdK-O-gHIVEGorJ0TxVRbipE9SIJypEVQ",
    "User": "GorJ0TxVRbipE9SIJypEVQ",
    "Project": "crowd",
    "Task": "crowd-vote",
    "Asset": {
        "Id": "xpZWabTwQFS94YgZdK-O-g",
        "Project": "crowd",
        "Url": "",
        "Name": "Space 3",
        "Metadata": {
        },
        "SubmittedData": {
            "categorize": null,
            "vote": null
        },
        "Verified": false,
        "Counts": {
            "Assignments": 2,
            "Favorites": 0,
            "finished": 1,
            "skipped": 0,
            "unfinished": 1
        }
    },
    "State": "unfinished",
    "SubmittedData": null
}

Calling this endpoint will find or create an unfinished task assignment for the current user.

Submit or Skip an Assignment

POST /projects/{project_id}/tasks/{task_id}/assignments

Cookie {project_id}_user_id

Response

{
    "Id": "crowdHIVEcrowd-voteHIVExpZWabTwQFS94YgZdK-O-gHIVEGorJ0TxVRbipE9SIJypEVQ",
    "User": "GorJ0TxVRbipE9SIJypEVQ",
    "Project": "crowd",
    "Task": "crowd-vote",
    "Asset": {
        "Id": "xpZWabTwQFS94YgZdK-O-g",
        "Project": "crowd",
        "Url": "http://blogs.scientificamerican.com/observations/files/2013/08/Black_Hole_Milkyway.jpg",
        "Name": "Space 3",
        "Metadata": {
        },
        "SubmittedData": {
            "categorize": null,
            "vote": null
        },
        "Verified": false,
        "Counts": {
            "Assignments": 2,
            "Favorites": 0,
            "finished": 1,
            "skipped": 0,
            "unfinished": 1
        }
    },
    "State": "finished",
    "SubmittedData": {
    	"Category": "usable"
    }
}

Simply post back an updated version of the JSON in the Create Assignment response to submit it (State: finished) or skip it (State: skipped).

Create an Assignment for a Specific Asset

GET /projects/{project_id}/tasks/{task_id}/assets/{asset_id}/assignments

Cookie {project_id}_user_id

Response Same as the more general 'create assignment' response

Use this endpoint in situations where you're displaying assets on your site and want to allow users to act on those specifically, rather than a random available valid asset. Specify an asset id and get back an assignment on it for the current user, project and task.

Lookup an Assignment by Id

GET /projects/{project_id}/assignments/{assignment_id}

Response

{
    "Id": "crowdHIVEcrowd-voteHIVExpZWabTwQFS94YgZdK-O-gHIVEGorJ0TxVRbipE9SIJypEVQ",
    "User": "GorJ0TxVRbipE9SIJypEVQ",
    "Project": "crowd",
    "Task": "crowd-vote",
    "Asset": {
        "Id": "xpZWabTwQFS94YgZdK-O-g",
        "Project": "crowd",
        "Url": "",
        "Name": "Space 3",
        "Metadata": {
        },
        "SubmittedData": {
            "categorize": null,
            "vote": null
        },
        "Verified": false,
        "Counts": {
            "Assignments": 2,
            "Favorites": 0,
            "finished": 1,
            "skipped": 0,
            "unfinished": 1
        }
    },
    "State": "unfinished",
    "SubmittedData": null
}

Returns information for a single assignment by id.

Assets

Actions available for assets outside of the admin.

Get an Asset

GET /projects/{project_id}/assets/{asset_id}

Response

{
    "Asset": {
        "Id": "AUnTaQpqzTmtUIq-fdvJ",
        "Project": "crowd",
        "Url": "http://blogs.scientificamerican.com/observations/files/2013/08/Black_Hole_Milkyway.jpg",
        "Name": "Space 3",
        "Metadata": null,
        "SubmittedData": {
            "categorize": null,
            "vote": null
        },
        "Verified": false,
        "Counts": {
            "Assignments": 0,
            "finished": 0,
            "skipped": 0,
            "unfinished": 0
        }
    }
}

This endpoint returns information for a single asset.

Favorite/Unfavorite an Asset

GET /projects/{project_id}/assets/{asset_id}/favorite

Cookie {project_id}_user_id

Response

Favoriting:

{
    "AssetId": "AUnTaQpqzTmtUIq-fdvJ",
    "Action": "favorited"
}

Unfavoriting:

{
    "AssetId": "AUnTaQpqzTmtUIq-fdvJ",
    "Action": "unfavorited"
}

This endpoint toggles favoriting or unfavoriting an asset for the current user.

Tasks

Actions available for tasks outside of the admin.

Get Tasks

GET /projects/{project_id}/tasks

Response

{
    "Tasks": [
        {
            "Id": "crowd-categorize",
            "Project": "crowd",
            "Name": "categorize",
            "Description": "categorize images",
            "CurrentState": "available",
            "AssignmentCriteria": {
                "SubmittedData": {
                    "categorize": {}
                }
            },
            "CompletionCriteria": {
                "Total": 50,
                "Matching": 50
            }
        },
        {
            "Id": "crowd-vote",
            "Project": "crowd",
            "Name": "vote",
            "Description": "vote on image quality (from 1 to 10)",
            "CurrentState": "waiting",
            "AssignmentCriteria": {
                "SubmittedData": {
                    "categorize": {
                        "ad-content": "usable"
                    },
                    "vote": {}
                }
            },
            "CompletionCriteria": {
                "Total": 2,
                "Matching": 2
            }
        }
    ],
    "Meta": {
        "Total": 0,
        "From": 0,
        "Size": 0
    }
}

Returns a list of tasks in this project.

Get Task

GET /projects/{project_id}/tasks/{task_id}

Response

{
    "Task": {
        "Id": "crowd-categorize",
        "Project": "crowd",
        "Name": "categorize",
        "Description": "categorize images",
        "CurrentState": "available",
        "AssignmentCriteria": {
            "SubmittedData": {
                "categorize": {}
            }
        },
        "CompletionCriteria": {
            "Total": 50,
            "Matching": 50
        }
    }
}

Returns information for a single task in this project.

API Endpoints

Finally, a list of all the API actions.

  • ANY / - useful for health checks / heartbeats
  • ANY /admin/setup - clears out db, configures elasticsearch and creates a project
  • GET /admin/projects - returns all projects in Hive
  • GET /admin/projects/{project_id} - returns project information
  • POST /admin/projects/{project_id} - creates or updates a project
  • GET /admin/projects/{project_id}/tasks - returns tasks in this project
  • POST /admin/projects/{project_id}/tasks - imports tasks into this project
  • GET /admin/projects/{project_id}/tasks/{task_id} - returns task information
  • POST /admin/projects/{project_id}/tasks/{task_id} - create or update a task
  • **enable and disable tasks
  • GET /admin/projects/{project_id}/assets - returns assets in this project
  • GET /admin/projects/{project_id}/assets?from=10&size=30 - paginates assets
  • GET /admin/projects/{project_id}/assets?task=:task&state=:state - returns a list of assets based on task and state
  • POST /admin/projects/{project_id}/assets - imports assets into this project
  • GET /admin/projects/{project_id}/assets/{asset_id} - get a single asset's data
  • GET /admin/projects/{project_id}/tasks/{task_id}/complete - mark any assets completed for this task
  • GET /admin/projects/{project_id}/users - returns users in this project
  • GET /admin/projects/{project_id}/users?from=0&size=10 - paginates users
  • GET /admin/projects/{project_id}/users/{user_id} - returns a single user in this project
  • GET /admin/projects/{project_id}/assignments?task={task_id}&state={state}
  • GET /admin/projects/{project_id}/assignments?task={task_id}&state={state}&from=from&size=size
  • GET /projects/{project_id}/tasks/{task_id} - returns task information
  • GET /projects/{project_id}/tasks/{task_id}/assignments - returns a new assignment for the given task + current user
  • POST /projects/{project_id}/tasks/{task_id}/assignments - submit assignment (contribute, fill in form, etc)
  • GET /projects/{project_id} - returns project information
  • GET /projects/{project_id}/assets/{asset_id} - returns asset information
  • GET /projects/{project_id}/tasks - returns tasks in this project
  • GET /projects/{project_id}/tasks/{task_id}/assets/{asset_id}/assignments - returns a new assignment for task + asset + current user
  • GET /projects/{project_id}/user - returns user information based on project session cookie
  • POST /projects/{project_id}/user - creates a user based on json data posted
  • POST /projects/{project_id}/user/external - looks up user by external id, returns session token
  • GET /projects/{project_id}/assets/{asset_id}/favorite - favorites an asset
  • GET /projects/{project_id}/user/favorites - returns a user's favorited ads
  • GET /projects/{project_id}/assignments/{assignment} - returns assignment information