/data-standards-demo-content

This is demo content that we are producing to show some of the potential of introducing a common framework.

Primary LanguageHTMLMIT LicenseMIT

Cross-government data description language demo

The UK Government Digital Service (GDS) has defined an initial data description language based on international standards and use cases. This language aims to make describing data across government simpler and more consistent.

You can use the data-standards-demo-content GitHub repository to try this data description language. This repository contains code that allows you to:

  • set up a form to generate data description information and associate this information with an Excel data file
  • import data from an Excel data file using associated data description information

This currently only works with Excel data files.

This is a work in progress, and will be regularly iterated and updated.

Before you start

You must have installed the following software on your local machine:

Generate data description information and associate this information with a data file

You generate data description information for an Excel data file by completing the Generate your Data Definition File form. You then associate this data description information with the Excel data file.

  1. Download the data-standards-demo-content GitHub repository on to your local machine.

  2. Open the HTML_Content folder in your local repository.

  3. Run the ConfigDDFile.html file in your browser to open the Generate your Data Definition File form.

    If the form does not load correctly, open the ConfigDDFile.html file in a development environment, and check that the path references at the start of the file are correct.

  4. Complete the following fields in the Generate your Data Definition File form.

    Field Definition Example
    creator An entity primarily responsible for making the data file creator:"John Smith john.smith@digital.cabinet-office.gov.uk"
    contributor An entity responsible for making contributions to the data file contributor:”jane.smith@company.co.uk
    title A name given to the data file title:"GDS Employees"
    created Date of creation of the data file created:2002-10-02
    identifier An unambiguous reference to the data file within a given context identifier:"0000015_GDS_SDA_XLS"
    description An account of the data file description:"All heights at GDS"

You do not need to complete the following fields because GDS has not yet implemented them:

  • Choose file button
  • Select your file type: field
  • Set the properties of your columns: field
  • Save Data Description File button

You can then either:

  • generate a data description file
  • copy the data description information to the clipboard

Generate a data description file

  1. Select the Generate Description File button to generate a data description file.

  2. Associate the data description file with the Excel data file.

Copy the data description information

  1. Select the Copy Data Description File button. This copies the data description information to the clipboard.

  2. Associate the data description information with the Excel data file. For example, you can copy this information into a worksheet in the Excel data file.

Import data from a data file using associated data description information

If an Excel data file has associated data description information, you can use that data description information to automatically read data from that data file.

The way you read the data from the Excel data file depends on where the associated data description information is. The data description information is usually held in:

  • a separate data description file
  • one of the Excel data file worksheets

You cannot currently specify which data you want. You must read all data from an Excel data file.

Read data using a separate data description file

  1. Open the Python Code folder in your local data-standards-demo-content GitHub repository.

  2. Open the externalfile.py file.

  3. Enter the location of the data description file into the externalfile.py file:

    def gettxt():
    df = pd.read_csv('DATA_DESCRIPTION_FILE_LOCATION', header=None)
    return df

    For example:

    def gettxt():
    df = pd.read_csv('/Users/johnsmith/Projects/ExcelExport/metadata2', header=None)
    return df
  4. Enter the location of the Excel data file into the externalfile.py file:

    # This will get the file, initial we have a default file, later we will allow them to input
    def getfile():
    #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx')
    wb = opxl.load_workbook('EXCEL_DATA_FILE_LOCATION')
    #wb = opxl.load_workbook('/Users/xxx/Projects/ExcelExport/GTest.xlsx')
    return wb

    For example:

    # This will get the file, initial we have a default file, later we will allow them to input
    def getfile():
    #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx')
    wb = opxl.load_workbook('/Users/johnsmith/Projects/ExcelExport/Excel Import Text.xlsx')
    #wb = opxl.load_workbook('/Users/xxx/Projects/ExcelExport/GTest.xlsx')
    return wb
  5. Enter the name of the Excel data file worksheet that contains the data:

    ws = wb['DATA_WORKSHEET']

    For example:

    ws = wb['data']
  6. Run the externalfile.py file to:

    • extract the required data from the Excel data file
    • print the extracted data to the command line
    • create an extracted data .csv file in the same location as the externalfile.py file

You can add references to multiple data and data description files to the externalfile.py file. However, GDS has not implemented looping functionality yet.

Read data from data description information in the Excel data file

  1. Open the Python Code folder in your local data-standards-demo-content GitHub repository.

  2. Open the excelImportoriginal.py file.

  3. Enter the location of the Excel data file into the excelImportoriginal.py file:

    # This will get the file, initial we have a default file, later we will allow them to input
    def getfile():
    #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx')
    wb = opxl.load_workbook('EXCEL_DATA_FILE_LOCATION')
    return wb

    For example:

    # This will get the file, initial we have a default file, later we will allow them to input
    def getfile():
    #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx')
    wb = opxl.load_workbook('/Users/johnsmith/Downloads/ExcelImport2.xlsx')
    return wb
  4. Enter the name of the worksheet that contains the data description information into the excelImportoriginal.py file:

    wst = wb['DATA_DESCRIPTION_INFORMATION_WORKSHEET']

    For example, if the data description information worksheet is named datadesc:

    wst = wb['datadesc']
  5. Run the excelImportoriginal.py file to:

    • extract the required data from the Excel data file
    • print the extracted data to the command line

You can find an example Excel data file with included data description information in the SampleFiles folder of the repository.

You can add references to multiple data files to the excelImportoriginal.py file. However, GDS has not implemented looping functionality yet.

There are no restrictions or requirements on the:

  • name of the Excel data file
  • location of the data description information within the data description worksheet

Further information

For more information on data standards in government, refer to the:

For more information on the data description language field names and definitions, refer to the:

Contributing

We want to get as much feedback as possible from data practitioners across government. You can:

We would also like you to join us in community events and workshops to discuss options and ideas. Please contact data-standards@digital.cabinet-office.gov.uk if you’re interested in taking part.

Licence

Unless stated otherwise, the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.

The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.