/notebooks-ocds

A collection of notebooks for analysing OCDS data stored in Kingfisher.

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

OCDS Notebooks

A collection of Jupyter notebooks for working with data from:

Note: If you encounter unfamiliar errors, try the Runtime > Disconnect and delete runtime menu item. If the error still occurs, please open an issue.

Notebooks

To use a notebook:

  • Click the Open In Colab button
  • Click the File > Save a copy in Drive menu item
  • Make your changes (e.g. collection_ids, schema_name, etc.)

If you make any improvements or fixes, please follow the Contributing guide below to merge your changes back into this repository.

You can also use a notebook without creating a copy. However, if you re-open the notebook, any changes and outputs will be lost.

Kingfisher Process

Notebook Open in Colab Description
Publisher analysis template Open in Colab Analyze data from a specific publisher.
Meta analysis template Open in Colab Analyze data from multiple publishers, or to perform other types of analysis on the Kingfisher Process database.
Basic criteria feedback template Open in Colab Provide feedback on the OCDS basic criteria.
Structure and format feedback template Open in Colab Provide feedback on structure and format errors reported by lib-cove-ocds.
Data quality feedback template Open in Colab Provide detailed feedback on structure, format, conformance and quality issues.
Usability checks template Open in Colab Provide feedback on data usability for OCDS datasets.

Other data sources

Notebook Open in Colab Description
Usability checks using a field list Open in Colab Provide feedback on data usability for prospective OCDS publishers, using a field list, like from a field-level mapping.
Usability checks using the Data Registry Open Iinn Colab Provide feedback on data usability using data from the Data Registry.
Relevant checks using a field list Open in Colab Provide feedback on data relevance for prospective publishers, using a field list, like from a field-level mapping.
Relevant checks using the Data Registry Open Iinn Colab Provide feedback on data relevance using data from the Data Registry.
Relevant checks for all the Data Registry publications Open Iinn Colab Provide feedback on data relevance downloading all the publications from the Data Registry.

Contributing

Components

To ease maintenance, the notebooks are made up of reusable components. To see which components are used in each notebook, refer to the NOTEBOOKS variable in manage.py.

Reminder: If you edit the Check structure and format or Check quality components and change the headings or add new sections, check whether the related Document template in this process note needs an update.

Component name Open in Colab Tasks
Environment Open in Colab Install requirements, import packages, load extensions and configure the notebook.
Cardinal setup Open in Colab Install Cardinal requirements, define coverage functions and calculate the field list for a given file.
Charts setup Open in Colab Install charts requirements, import charts packages and define plot functions.
Kingfisher Process setup Open in Colab Connect to the database. Choose the collection(s) and schema to work with.
Field list setup Open in Colab Load the field list.
Data Registry download data setup Open in Colab Define the functions to list publications and download JSONL files from the registry.
Data Registry download data Open in Colab Define the forms to select a publication and year and download the selected publication.
Kingfisher Process errors Open in Colab Check for data collection and processing errors.
Structure scope Open in Colab Check how many releases and records your data contains. Check the date range and stages of the contracting process covered by your data.
Usability setup Open in Colab Define the usability functions.
Usability scope Open in Colab Calculate general statistics.
Structure checks Open in Colab Check for structure and format errors reported by lib-cove-ocds.
Conformance checks Open in Colab Check against the OCDS conformance criteria.
Quality checks Open in Colab Check for conformance and quality issues that require manual review.
Usability checks using Kingfisher with coverage Open in Colab
Usability checks using a field list without coverage Open in Colab
Relevant checks using a field list Open in Colab Given a field list, check if the list pass the "relevant" criteria.
Relevant checks against all the publications from the Data Registry Open in Colab Downloads all the publications from the registry and performs the "relevant" checks against the active ones.

Use the buttons above to open the components from the main branch for editing in Google Colaboratory (Colab).

To open a component from a different branch, use Colab's GitHub browser.

To encourage reuse, limit the scope of a component. The current scopes are:

  • Environment: Setup Google Colaboratory in general.
  • Setup: Setup Google Colaboratory for a data source.
  • Errors: Review any issues in loading the data.
  • Scope: Understand the scope of the data.
  • Check: Perform a category of checks.

Add a component

  1. Create a new notebook
  2. Set a title using H2 formatting and add your cells, following the style guide for SQL statements.

Edit a component

  1. Open the component in Colab.
  2. Add or edit cells, following the style guide for SQL statements.

Commit your changes

  1. Create a branch.

In Colab:

  1. Click Edit -> Clear all outputs.
  2. Click File -> Save a copy in GitHub.
  3. Uncheck 'Include a link to Colaboratory'
  4. Select your branch, enter a commit message and click OK.

Add new components to a notebook

  1. Add the component to the entry for the notebook in the NOTEBOOKS variable in manage.py.

Add a new notebook

  1. Add an entry for the the notebook and its components to the NOTEBOOKS variable in manage.py.
  2. Update the 'Notebooks' section of README.md.

Request a review

  1. Create a pull request.
  2. Request a review from a data support manager.
  3. If the reviewer requests changes, make the changes then repeat this step.

Merge your changes

Once approved, you can merge your own changes.

Reviewing

Review changes

Review the changes.

For small changes, you can review the raw diff in the GitHub review interface.

For larger changes, you can review and comment on a visual diff by clicking the ReviewNB button. You need to authorize the app the first time you open it.

Maintenance

Format SQL cells and merge components to build notebooks:

  1. Install pg_format.

  2. Install requirements:

    pip install -r requirements.txt
  3. Install the pre-commit script:

    pre-commit install