Data Quality Dashboard

Data Quality Dashboard provides access to, and displays statistics on, a collection of published data. This collection of data is logically related: for example, data published by a single government department, or a group of departments.

The Data Quality Dashboard has been developed in order to display data quality information on the 25K spend data published by the UK Government on data.gov.uk. You can see and interact with this instance of Data Quality Dashboard here. It is powered by a static database generated with Data Quality CLI that you can find here.

The Dashboard can be used for any published collection of data by following a few key steps.

Local development

# Get the code
git clone https://github.com/okfn/data-quality-dashboard.git

# Install the dependencies
npm install

# Just build the sources
npm run build

# Just run the server
npm run start

# View the app in your browser
open http://localhost:3000/

See the scripts section in package.json for more available commands.

Read on for details.

Application

The Data Quality Dashboard is a Node.js application written in ES6, largely using Express and React.

The app.backend module renders the basic views (using React on the server) and is responsible for preparing the data as JSON by parsing the CSV database. It also provides some simple routes for standard pages like FAQ and About.

The app.ui module is a React-Redux application for displaying the data to the user.

The codebase is written in Node.js-style CommonJS, using ES6 syntax. The app.ui code is bundled by (Webpack)[http://webpack.github.io/], and app.backend is transformed using Babel at runtime.

Remote deployment

We push to Heroku, and a postinstall script ensures that app.ui is bundled before the app is served. Make sure you set NPM_CONFIG_PRODUCTION=false to include devDependencies on Heroku.

Data

The Data Quality Dashboard reads data from a flat file storage, with data written to CSV and JSON. Any publicly available file storage will do, as long as the file naming and data structure of the files is consistent.

Currently, we run the database for the UK Spend Publishing Dashboard from a public repository on GitHub. This gives easy access to the files, and enables a version history of the database.

As GitHub does not support CORS, we then use a proxy that does - RawGit.

When the application loads, it reads the data from the database, parses the content to JSON, and stores the new data representation as JSON. This JSON representation is accessible via an API endpoint that the frontend app uses.

To configure the database, the application needs to know the base path as a URL.

For example:

https://rawgit.com/okfn/data-quality-uk-25k-spend/master/data

By default, the application expects to find at that base the following files:

instance.json: Basic metadata for the instance
sources.csv: The list of data sources that are assessed for quality
publishers.csv: The list of publishers that produce these datasources
results.csv: The results as found by SPD-Admin
performance.csv: The performance as found by SPD-Admin
runs.csv: A log of the results run against these resources

Of course, each of these files must conform to a certain datastructure - think of them as tables in a database. As long as you conform to the structure and expected data within that structure, it does not matter how the database is actually produced.

For how to change the database see the Configure database section.

Schema

The Data Quality Dashboard expects the following schema.