Datasheet Server
Turn spreadsheet data into a structured, dynamic API.
Overview · Configuration · Quickstart
Datasheet server makes resources from a spreadsheet available as a structured API.
- Manage structured data without developers. Allows anyone to dynamically manage data, while simultaneously making this data available in a reliably structured format for frontend interfaces and other programmatic applications.
- Designed for a dynamic workflow. References data in a spreadsheet source as a ground truth, but adds a layer of indirection that keep API routes from breaking when changes are made.
- Customisable data transformation. Easily create new blueprints to specify the API structure that should be presented from source data.
- Extensible architecture. Currently supports Google Sheet as a source and a REST-like query language, but structured modularly with an intention to support other sources and query languages.
Overview
Datasheet server is a Node server developed at Forensic Architecture to make data that is being dynamically modified by researchers concurrently consumable for programmatic applications as an API. We use spreadsheets extensively to keep track of information during our investigations, and we often want to make this data available via structured queries to make it available in a frontend web application, a game engine, or another use case.
Querying data directly from spreadsheets is brittle, as it relies on the maintenance of a rigid structure in the sheets at all times. By putting Datasheet Server as a proxy that sits in between source sheets and their consumers, it is possible to dynamically modify sheets without breaking applications. A data admin can then use Datasheet Server to ensure that applications always receive eligible data, without foregoing the spreadsheets as sources of truth.
To see how to get a local instance of datasheet server running in practice, see this wiki explaining how to use it to feed data from a Google Sheet to a local instance of Timemap.
Design Concepts
The codebase currently only supports Google Sheets as a source, and a REST-like format as a query language. It is designed, however, with extensibility in mind.
Sources
A source represents a sheet-like collection of data, such as a Google Sheet. A source has one or more tabs, each of which contains a 2-dimensional grids of cells. Each cell contains a body of text (a string).
Resources
The data from sources are made available as resources, which are structured blocks of data that are granularly accessible through a query language. Resources are the outfacing aspect of Datasheet Server, and represent the only kind of data that can be queried by applications. Each resource is configured with one or more query languages. (Currently only a REST-like query language supported.)
Blueprints
Blueprints are a data structure that represent the way that infromation from sources are to be turned into resources. For each tab in a source, there is a corresponding Blueprint. Blueprints are created through a blueprinter function invoked on the raw data from a source tab.
Blueprints are JSON objects. There have two forms:
- desaturated -- describes the resources and query languages available on data from a source tab.
- saturated -- both describes resources available on data from a source etab, and contains that data.
A desaturated Blueprint can be saturated by retrieving its data from the server's model layer, which stores tab data from sources.
A JSON catalogue of the available blueprints (desaturated) in a server is available at /api/blueprints
.
Configuration
Copy the example environment file:
cp .env.example .env
Inside this file, you will need to modify at least the SERVICE_ACCOUNT_EMAIL
and SERVICE_ACCOUNT_PRIVATE_KEY
fields. These fields refer to the credentials of a Google service account. Google requires that developers create these when attempting to access their services programmatically, so that they can attribute requests to users. Service accounts also contain identity information, which means that asset owners can allow certain service accounts access to certain sheets, just as one might differentially grant certain users access to certain cloud assets.
Once you have created a service account, create and download an API key for that account. The JSON file for the API key that you download when you create it contains both a service account private key, and an email associated with the service account: add these respectively in the strings in .env for SERVICE_ACCOUNT_PRIVATE_KEY
and SERVICE_ACCOUNT_EMAIL
.
The last thing to do is to grant the service account access to the sheet that datasheet-server is pulling from. You can add a service account to a sheet as you would any other Google user: just enter the email address associated. (Note that this step is not necessary if you are accessing a publicly available sheet.)
Other configuration options, such as the port at which the server will expose resources, are also modifiable from the .env file.
Sources
Sources are specified in src/sheets_config.js. Datasheet server currently only supports Google Sheets as a source.
Google Sheets
| sheets | A list of objects, one for each sheet that is being used as a source. Each sheet object has a name
(String), an id
(String), and a tabs
(object) field, which are explained below. | object |
Each Google Sheet being used as a as source requires a corresponding object in sheets
. The object should be structured as follows:
Option | Description | Type |
---|---|---|
name | Used to refer to data served from this source | string |
id | The ID of the sheet in Google. (You can find it in the address bar when the Sheet is open in a browser. It is the string that follows 'spreadsheets/d/'). | string |
tabs | An object that maps each tab in the source to one or more Blueprinters. All of the Blueprinters in the blueprinters folder are available through a single import as at the top of example.config.js. To correctly associate a Blueprinter, the object key needs to be the tab name with all lowercase letters, and spaces replaced by a '-'. For example, if the tab name in Google Sheets is 'Info About SHEEP', the object key should be 'info-about-sheep'. The value should be the Blueprinter function that you want to use for the data in that tab. If you require more than one endpoint for a single tab, you can support multiple blueprinters by making the item an array. See the example of a configuration object below. TODO: no Blueprinter is used by default. |
object |
Example Configuration Object
import BP from './lib/blueprinters'
export default {
googleSheets: {
email: 'project-name@reliable-baptist-23338.iam.gserviceaccount.com',
privateKey: 'SOME_PRIVATE_KEY',
sheets: [
{
name: 'example',
id: '1s-vfBR8Uy-B-TLO_C5Ozw4z-L0E3hdP8ohMV761ouRI',
tabs: {
'objects': BP.rows,
'fruit': [BP.columns, BP.ids],
}
},
]
}
}
## Quickstart Clone the repository to your local:
git clone https://www.github.com/forensic-architecture/datasheet-server
Follow the steps in the configuration section of this document.
Run with Docker
To create a new instance of the server with Docker installed, ensure that you have followed the steps in the quickstart guide above, then and build an image locally. (Note that Docker must be installed):
docker build -t datasheet-server .
You can then run the container and make available the relevant port (4040
by default):
docker run -d -p 4040:4040 datasheet-server
If running on a cloud server, you'll probably need to zero out the host IP:
docker run -d -p 0.0.0.0:4040:4040 datasheet-server
Run locally
Install dependencies:
yarn # npm install
And run development server:
yarn dev # npm run dev
Contribute
Code of Conduct
Please read our adopted code of conduct before contributing, so that you can understand what actions will and will not be tolerated.
Contributing Guide
Read our contributing guide to learn about our development process, how to propose bugfixes and improvements.
Community
If you have any questions or just want to chat, please join our team fa_open_source on Keybase for community discussion. Keybase is a great platform for encrypted chat and file sharing.
License
Datasheet Server is distributed under the MIT License.