/river-view

Public Temporal Streaming Data Service Framework

Primary LanguageJavaScriptMIT LicenseMIT

River View Build Status Coverage Status Join the chat at https://gitter.im/nupic-community/river-view

Public Temporal Streaming Data Service Framework

A View of the Mississippi River

See River View in action ⤤

River View is a Public Temporal Streaming Data Service Framework (yes, that's a mouthful!). It provides a pluggable interface for users to expose temporal data streams in a time-boxed format that is easily query-able. It was built to provide a longer-lasting historical window for public data sources that provide only real-time data snapshots, especially for sensor data from public government services like weather, traffic, and geological data.

River View fetches data from user-defined Rivers at regular intervals, populating a local Redis database. This data is provided in a windowed format, so that data older than a certain configured age is lost. But the window should be large enough to provide enough historical data to potentially train machine intelligence models on the data patterns within it.

Code Docs

See online documentation at http://nupic-community.github.io/river-view/.

Dependencies

You must have a Redis instance available. The URL to the instance should be set in an environment variable called REDIS_URL, something like:

export REDIS_URL=redis://127.0.0.1:6379

You may use authentication in the Redis URL string:

export REDIS_URL=redis://username:password@hostname:port

Rivers

A River is a pluggable public data stream gathered from one or more origins and collected in a query-able temporary temporal pool. Rivers are declared within the rivers directory, and consist of:

  • a namespace, which is assumed based upon the directory name of the data source within the rivers directory
  • a YAML configuration file, containing:
    • one or more external URLs where the data is collected, which are public and accessible without authentication
    • the interval at which the data source will be queried
    • when the data should expire
  • a JavaScript parser module that is passed the body of an HTTP call to the aforementioned URL(s), which is expected to parse it and return a temporal object representation of the data.

Each River may produce data for many unique data items, but they must have unique identifiers. For example, a city traffic data source may produce data for many traffic paths within the city, each identified with a unique ID. A US state water level data source might have unique sources for each water level sensor in the state, each with a unique ID.

River Types

All river streams must have a timestamp for each row of data. Other than that, they might have different primary types of data, as described below:

  • spacial: integer or float values
  • geospatial: latitude / longitude (floats)
  • categorical: string values

The data streams will be presented differently, both in JSON and HTML, depending on the type specified in the config.yml file.

Creating a River

Please see Creating a River in our wiki.

Web Services

In addition to collecting and storing data from Rivers, a simple HTTP API for reading the data is also active on startup. It returns HTML, JSON, and (in some cases) CSV data for each River configured at startup.

URLs

URL Description
`/index.[html json]`
`//props.[html json]`
`//keys.[html json]`
`///data.[html json
`///meta.[html json]`

Running Locally (on OS X)

OS X has some weird built in behaviors regarding the maximum number of open file descriptors. River-view needs the system to handle around 1024 open descriptors to actually start up, so if you run into any sort of file-can't-be-opened errors, check that you have an appropriate number of maximum open file descriptors by running ulimit -n. If this number is less than 1024, you'll need to update it.

Updating the maximum number of open file descriptors

  • sudo launchctl limit maxfiles 1024 unlimited

This updates the maximum number of open file descriptors your Mac will allow. This number is not persistant across reboots. To make it persistant add limit maxfiles 1024 unlimited to /etc/launchd.conf

  • ulimit -n 1024

This updates the current shell you're in to be able to make use of all those file descriptors.