Jisc Learning Analytics Unified Data Definitions v1.4.1

Version 1.4.1 released 14 May 2019.

Introduction

The Unified Data Definitions (UDD) of the Jisc learning analytics project is a vocabulary of the chief data entities of interest to learning analytics: students, courses, modules, and so on, as well as their characteristics. The data coded with this vocabulary is typically extracted from the student record system of a college or university.

Along with xAPI recipes, the UDD makes up the core data specification of the Jisc learning analytics architecture.

The main folder (jiscdev/analytics-udd) contains:

this ReadMe file that gives an overview of the UDD
the details of the UDD licencing arrangements
a UDD entity-relationship diagram
a link to spreadsheets listing the differences between the previous version of the UDD and this version
a consolidated list of the descriptions of each UDD entity.

In addition to the main folder, there are 4 sub-folders. The udd sub-folder is the heart of the specification, with a file for each entity describing its properties in detail. Refer to these files to design data for import into the Learning Data Hub. The media sub-folder contains supporting files, including the E-R diagram source, the changes spreadsheet, Guides to the relative importance of UDD properties in respect of applications, products and services that use the UDD, and copies of the JACS3 and HECoS subject classification systems. The utilities sub-folder has code fragments and snippets to support the development and use of the UDD. The implementation sub-folder describes matters that are not part of the formal UDD specification, but are closely related to it, for example a description of the mechanism for handling unofficial extensions to properties in the UDD, and filename conventions for adding data into the Learning Data Hub.

For release schedule and version control, see below.

Differences between v1.3 and v1.4

The development of v1.4 has involved a number of additions and changes. This overview page lists the changes in summary and provides a spreadsheet with the mapped listing of each entity and property change between v1.3.3 and v1.4.0. There is also a spreadsheet listing changes between v1.4.0 and v1.4.1.

Data format

UDD data must be UTF-8 encoded. TSV is the preferred data format, but JSON and XML data are also supported. Other formats are not supported.

When providing UDD data, supply the data for different entities in separate files, 1 file per entity, using the UDD filename conventions.

Diagram

2 entity-relationship diagrams provide an overview of the specification. There is a brief E-R diagram containing just the primary keys, constraints and foreign key properties, and a full E_R diagram with all the properties.

Entities

Primary keys

Some entities have uniqueness constraints across multiple properties; for example student_on_course_instance has STUDENT_COURSE_MEMBERSHIP_ID plus COURSE_INSTANCE_ID. The MD files for these entities contain a note to this effect. These entities have a single primary key for ease of processing and to enable field-level extensibility. The data supplier may choose to provide the single primary key, or may choose to leave it blank, in which case it will be generated by the Learning Data Hub loading mechanism.

assessment_instance

course

course_instance

course_subject

event

institution

module

module_instance

module_subject

module_vle_map

student

student_course_membership

student_event

student_on_assessment_instance

student_on_a_module_instance

student_on_course_instance

Additional sections

period

staff

staff_on_course_instance

staff_on_mod_instance

student_id_map

Table of entities and corresponding API endpoint names

See the filename_conventions file for information about providing UDD data.

ENTITY NAME	API ENDPOINT NAME
assessment_instance	assessmentinstance
course	course
course_instance	courseinstance
course_subject	coursesubject
event	event
institution	institution
module	module
module_instance	moduleinstance
module_subject	modulesubject
module_vle_map	modulevlemap
period	period
staff	staff
staff_on_course_instance	staffcourseinstance
staff_on_mod_instance	staffmoduleinstance
student	student
student_course_membership	studentcoursemembership
student_event	studentevent
student_id_map	studentidmap
student_on_assessment_instance	studentassessmentinstance
student_on_a_module_instance	studentmoduleinstance
student_on_course_instance	studentcourseinstance

There are also files of code lists extracted from the MD files for machine processing.

UDD code lists in Welsh

UDD code lists in English

Mandatory and optional properties

The properties of the UDD are required in compliant datasets to different degrees, dependent on the products and services that use the outputs. There is a Field Guide spreadsheet that, for each product or service, indicates how important each property is for operation or use of the product. A Field Guide is included in this version of the UDD, so that vendors and service providers can supply their own entries for institutions to refer to as an aid to data preparation. The Field Guide enables the vendor or service provider to indicate data items that are MANDATORY, IMPORTANT, PREFERRED or NOT USED.

Code lists

Some UDD properties consist of code lists. Some have values derived from HESA tables (for HE) or ILR tables (for FE). In general these code lists are mapped to generic UDD code lists, so that they are standardised across data from multiple institutions. To extract code lists from the UDD MD files, you may wish to use the Python utility provided here.

Some code lists will be specific to one or a limited group of institutions. These lists are not included in the UDD and can be generated by the vendor. They can be loaded to the Learning Data Hub via a standard JSON format or can be handled via extensions (see below). An example of the JSON format is:

"MOD_LEVEL": {"A": null, "C": null, "B": null, "E": null, "D": null, "1": null, "0": null, "3": null, "2": null, "5": null, "7": null, "6": null, "9": null}}

Extensions

There is provision for data extensions at the level of property (in other words, "field-level extensions"). Although not strictly part of the UDD, a separate entity is provided and described at extension.md.

Specification development workflow

The simplest way of contributing to the UDD is as follows:

add an issue to the issue tracker to alert everyone to what you are working on and why.
tag the issue with the version milestone of which you'd like the patch to be a part.
make an edit or add a file in this repository, and save it to your own branch. If you prefer, you can fork the whole repository and work in your own repository.
send a pull request once you're done.
the pull request will be discussed at our regular meetings and either merged, or kept in the queue, depending on whether more work is required.

You can do all this through the Github GUI, but you're welcome to use any other git tool you prefer.

Release schedule and version control

Particular release versions will get their own branches, but the master branch will always contain the latest agreed release. Releases will be made after the review group has come to an agreement.

Versioning is done broadly as follows: (majorVersion.minorVersion.patch) major versions indicate major data model changes. Minor versions denote changes that can break applications, such as the deletion of properties that were valid in earlier versions. Patches can include the addition of new properties.

There will usually be a new minor version with breaking changes available for use in June of each year, in time for the next academic year. For example, from version 1.4 to version 1.5 in June 2019 for the 2019-20 academic year.

All version changes will be announced in advance on the repository issue tracker and in this README file.

Note that some properties will be marked as 'deprecated'. This means that the property is still valid, but will be removed by the next minor version update.

Acknowledgements

Many thanks to all contributors who have raised issues, sent pull requests, commented and made suggestions. The UDD specification is the achievement of all of you.

@alanepaull
@andrewhickey
@arc12
@christoffballard
@craig-petch
@ds10
@gmoger-jisc
@gryglbrt
@ht2
@huwrobertsjisc
@jfmullaney
@michaelwebjisc
@MiroslavKratchounov
@robwynj
@ryansmith94
@sandeepmjay
@willblenkhorn
@wilmTap

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

ds10/analytics-udd