The UK Government Digital Service (GDS) has defined an initial data description language based on international standards and use cases. This language aims to make describing data across government simpler and more consistent.
You can use the data-standards-demo-content
GitHub repository to try this data description language. This repository contains code that allows you to:
- set up a form to generate data description information and associate this information with an Excel data file
- import data from an Excel data file using associated data description information
This currently only works with Excel data files.
This is a work in progress, and will be regularly iterated and updated.
You must have installed the following software on your local machine:
You generate data description information for an Excel data file by completing the Generate your Data Definition File form. You then associate this data description information with the Excel data file.
-
Download the
data-standards-demo-content
GitHub repository on to your local machine. -
Open the
HTML_Content
folder in your local repository. -
Run the
ConfigDDFile.html
file in your browser to open the Generate your Data Definition File form.If the form does not load correctly, open the
ConfigDDFile.html
file in a development environment, and check that the path references at the start of the file are correct. -
Complete the following fields in the Generate your Data Definition File form.
Field Definition Example creator An entity primarily responsible for making the data file creator:"John Smith john.smith@digital.cabinet-office.gov.uk" contributor An entity responsible for making contributions to the data file contributor:”jane.smith@company.co.uk” title A name given to the data file title:"GDS Employees" created Date of creation of the data file created:2002-10-02 identifier An unambiguous reference to the data file within a given context identifier:"0000015_GDS_SDA_XLS" description An account of the data file description:"All heights at GDS"
You do not need to complete the following fields because GDS has not yet implemented them:
- Choose file button
- Select your file type: field
- Set the properties of your columns: field
- Save Data Description File button
You can then either:
- generate a data description file
- copy the data description information to the clipboard
-
Select the Generate Description File button to generate a data description file.
-
Associate the data description file with the Excel data file.
-
Select the Copy Data Description File button. This copies the data description information to the clipboard.
-
Associate the data description information with the Excel data file. For example, you can copy this information into a worksheet in the Excel data file.
If an Excel data file has associated data description information, you can use that data description information to automatically read data from that data file.
The way you read the data from the Excel data file depends on where the associated data description information is. The data description information is usually held in:
- a separate data description file
- one of the Excel data file worksheets
You cannot currently specify which data you want. You must read all data from an Excel data file.
-
Open the
Python Code
folder in your localdata-standards-demo-content
GitHub repository. -
Open the
externalfile.py
file. -
Enter the location of the data description file into the
externalfile.py
file:def gettxt(): df = pd.read_csv('DATA_DESCRIPTION_FILE_LOCATION', header=None) return df
For example:
def gettxt(): df = pd.read_csv('/Users/johnsmith/Projects/ExcelExport/metadata2', header=None) return df
-
Enter the location of the Excel data file into the
externalfile.py
file:# This will get the file, initial we have a default file, later we will allow them to input def getfile(): #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx') wb = opxl.load_workbook('EXCEL_DATA_FILE_LOCATION') #wb = opxl.load_workbook('/Users/xxx/Projects/ExcelExport/GTest.xlsx') return wb
For example:
# This will get the file, initial we have a default file, later we will allow them to input def getfile(): #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx') wb = opxl.load_workbook('/Users/johnsmith/Projects/ExcelExport/Excel Import Text.xlsx') #wb = opxl.load_workbook('/Users/xxx/Projects/ExcelExport/GTest.xlsx') return wb
-
Enter the name of the Excel data file worksheet that contains the data:
ws = wb['DATA_WORKSHEET']
For example:
ws = wb['data']
-
Run the
externalfile.py
file to:- extract the required data from the Excel data file
- print the extracted data to the command line
- create an extracted data
.csv
file in the same location as theexternalfile.py
file
You can add references to multiple data and data description files to the externalfile.py
file. However, GDS has not implemented looping functionality yet.
-
Open the
Python Code
folder in your localdata-standards-demo-content
GitHub repository. -
Open the
excelImportoriginal.py
file. -
Enter the location of the Excel data file into the
excelImportoriginal.py
file:# This will get the file, initial we have a default file, later we will allow them to input def getfile(): #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx') wb = opxl.load_workbook('EXCEL_DATA_FILE_LOCATION') return wb
For example:
# This will get the file, initial we have a default file, later we will allow them to input def getfile(): #wb = opxl.load_workbook('/Users/xxx/Downloads/Excel Import Text.xlsx') wb = opxl.load_workbook('/Users/johnsmith/Downloads/ExcelImport2.xlsx') return wb
-
Enter the name of the worksheet that contains the data description information into the
excelImportoriginal.py
file:wst = wb['DATA_DESCRIPTION_INFORMATION_WORKSHEET']
For example, if the data description information worksheet is named
datadesc
:wst = wb['datadesc']
-
Run the
excelImportoriginal.py
file to:- extract the required data from the Excel data file
- print the extracted data to the command line
You can find an example Excel data file with included data description information in the SampleFiles
folder of the repository.
You can add references to multiple data files to the excelImportoriginal.py
file. However, GDS has not implemented looping functionality yet.
There are no restrictions or requirements on the:
- name of the Excel data file
- location of the data description information within the data description worksheet
For more information on data standards in government, refer to the:
-
GDS technology blog on helping us improve government data standards
-
data in government blog on improving how we manage spreadsheet data
For more information on the data description language field names and definitions, refer to the:
We want to get as much feedback as possible from data practitioners across government. You can:
- comment on the data standards proposal
- raise an issue against the code in this repository
We would also like you to join us in community events and workshops to discuss options and ideas. Please contact data-standards@digital.cabinet-office.gov.uk if you’re interested in taking part.
Unless stated otherwise, the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.
The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.