The GSQ Data Catalogue is an index of all geoscience-related data in digital, physical, or federated form.
The primary goal of the data catalogue is to make geoscience data FAIR.
Figure 1: FAIR Data Principles
- Findable: Data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier
- Accessible: Metadata and data are understandable to humans and machines. Data is deposited in a trusted repository.
- Interoperable: Metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation.
- Reusable: Data and collections have a clear usage licenses and provide accurate information on provenance.
- Data Portal Catalogue – a CKAN-based data catalogue with a number of extensions that improve user experience and performance.
- Data Store - low cost, high volume data object storage (AWS S3 buckets).
- Data Schemas - standardised data models that represent a dataset: its metadata, elements and attributes, and relationships to other datasets and data elements. The data schemas based on DCAT2, ISO, GGIC, PPDM, and other standards.
- Controlled Vocabularies – agreed sets of terms to enable data to be shared and reused across application, enterprise, and community boundaries.
- Persistent Identifiers (PID) – a long-lasting reference to a digital resource such as a document, file, web page, or other object.
- Linked Data – connecting related data so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.
- GSQ Open Data Portal - a freely available catalogue of geoscience data.
- GSQ Internal Data Portal - an internal catalogue of data that is not yet classified as open data.
- GSQ Knowledgebase - a data catalogue of GSQ-owned datasets for use by GSQ geoscientists and other departmental users. NOTE: The Knowledgebase and Internal Data Portals are the same CKAN instance.
- Extend and optimise the CKAN Data Catalogue technology platform to enhance functionality, performance, and usability.
- Programmatically populate the data catalogue from existing metadata systems MERLIN, QDEX Reports, QDEX Data, GEM, geochemistry Microsoft Access database.
- Programmatically populate the data catalogue with metadata for geoscience data objects that are not in existing metadata systems, e.g. data on NAS or other storage. This will include activities such as extracting metadata from file header data.
- Optimise the search functionality for the data catalogue. CKAN uses SOLR as its search engine, but we are open to additional search capability, particularly that which allows for facetted searching and the ability to search across specific data attributes across the data store.
- Configure the security of the data catalogue according to the Access Rights security schema: e.g. open access, embargoed access, restricted access, metadata only access.
- Assist GSQ staff to create, optimise and extend geoscience data schemas.
- Create schema validations.
- Optimise the performance and functionality of the vocabulary management tools.
- Integrate external (non-GSQ) master data sources into the vocabulary manager.
- Integrate the data catalogue, lodgement forms, and other data collections to existing PID minting services.
- Create PID minting services where no National or State minting service is available.
- Apply PIDs to legacy data as part of data migration to the data catalogue and data lake.
- Assist GSQ staff in the creation, extension, and optimisation of linked data.
- Optimise the performance and end user experience of the graph database to make data more discoverable and interactive.
- Implement visualisations of the linked data.
- Implement APIs to the graph database.
The following CKAN extensions are currently installed in the GSQ CKAN platforms:
Plugin | Purpose | Deployed in | URL |
---|---|---|---|
ckanext-cloudstorage | Enables storage of resources in AWS S3 | Open & Private | URL |
ckanext-dcat | Provides DCAT2 metadata export | Open & Private | URL |
ckanext-drupal_api | Enables page layout changes | Open | URL |
ckanext-drupal_idp | Handles session state for user authentication | Open | URL |
ckanext-fpx | CKAN adapter for FPX service | Open & Private | URL |
ckanext-googleanalytics | Enables traffic analysis | Open & Private | URL |
ckanext-gsq-internal-theme | Custom Geological Survey of Qld (GSQ) theme | Private | URL |
ckanext-gsq-theme | Custom Geological Survey of Qld (GSQ) theme | Open | URL |
ckanext-harvest | Provides a common harvesting framework for ckan extensions | Open & Private | URL |
ckanext-or_facet | Enables logic change for applying facets | Open & Private | URL |
ckanext-pdfview | Enables SAML SSO | Open & Private | URL |
ckan-python3 | CKAN compatibility with Python 3 | Open & Private | URL |
ckanext-saml2auth | Enables SAML2 based SSO | Private | URL |
ckanext-scheming | Create custom metadata forms | Open | URL |
ckanext-scheming | Create custom metadata forms | Private | URL |
ckanext-spatial | Adds geospatial capabilities to CKAN | Open & Private | URL |
ckanext-syndicate | Syndicate datasets to another CKAN instance | Open & Private | URL |
ckanext-xloader | Loads CSV (and similar) data into CKAN's DataStore | Open & Private | URL |
ckanext-zippreview | Preview contents of ZIP files | Open & Private | URL |
You can see what plugins are currently installed with this API query
The following data migration is required:
Source | Dataset | Metadata source | Target data schema | Comments |
---|---|---|---|---|
QDEX Data | 3D Models | QDEX Data | DCAT2 Dataset | -- |
QDEX Data | Airborne Geophysics | GEM | Airborne Surveys | -- |
QDEX Data | ASTER | QDEX Data | ASTER | -- |
QDEX Data | Geological mapping data | -- | DCAT2 Dataset | -- |
QDEX Data | GIS packages | QDEX Data | DCAT2 Dataset | -- |
QDEX Data | -- | -- | -- | -- |
QDEX Data | -- | -- | -- | -- |
QDEX Data | -- | -- | -- | -- |
QDEX Data | Wireline Logs | -- | -- | -- |
Lodgement Portal | Open Exploration Reports | -- | -- | -- |
Lodgement Portal | Open Industry Consultative Reports | -- | -- | -- |
QDEX Reports | Queensland Geological Maps | -- | -- | Reports |
QDEX Reports | GSQ Record Series | -- | -- | Reports |
QDEX Reports | Soils and Land Resources Reports | -- | -- | Reports |
QDEX Reports | GSQ Exploration Reports | -- | -- | Reports |
QDEX Reports | Departmental Publications | -- | -- | -- |
QDEX Reports | GSQ-Commissioned Industry Studies/Reports | -- | -- | -- |
-- | -- | -- | -- | -- |
-- | -- | -- | -- | -- |
-- | -- | -- | -- | -- |
-- | -- | -- | -- | -- |
-- | -- | -- | -- | -- |
-- | -- | -- | -- | -- |
-- | -- | -- | -- | -- |
-- | -- | -- | -- | -- |
MERLIN | Bibliographies | MERLIN | DCAT2 Dataset | -- |
Facet | Type | AND/OR | Metadata Field |
---|---|---|---|
Spatial bounds | Bounding box | AND | spatial |
Data Types | Facet | AND | dataset_type |
Commodities | Facet/multiple | OR | commodity |
Earth Science Data Category | Facet | AND | earth_science_data_category |
Geological Features | Facet, searchable | AND | geologic_feature |
Report type | Facet | AND | georesource_report_type |
Data Formats | Facet | AND | |
Date | Multi value (with ranges) | AND, intersect of date range | dataset_start_data, dataset_completion_date |
Access Rights | Facet (Knowledgebase only) | OR | extra:access_rights |
Attribute | Placeholder | Prefix | Validation | Meta Title | |
---|---|---|---|---|---|
Any attribute | Enter a search term | A-z, 1-9, space, “-” | all | ||
Title | Enter any title | none | A-z, 1-9, space, “-” | title | |
Persistent identifier | Enter a Persistent Identifier | See persistent identifiers below | extra:identifier | ||
Report PID | Enter a Report ID | cr | 1-9 | extra:identifier | Search only dataset type report |
Survey PID | Enter a Survey Number, e.g. ss12345 | Two characters or a-z plus+ 0-9 | extra:identifier | Search only dataset type survey | |
Permit ID | Enter a Permit ID, e.g. “EPM12345” | none | A-z, 1-9 | resource_authority_permit | |
Borehole PID | Enter a Borehole PID, e.g. bh12345 | bh | 1-9 | extra:identifier | Search only dataset type borehole |
Borehole Name | Enter a Borehole Name | A-z, 1-9, space, “-” | title | Search only dataset type borehole | |
Borehole Alias | Enter any Borehole Alias | none | A-z, 1-9, space, “-” | alias | Search only dataset type borehole |
This code repository's content are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0), the deed of which is stored in this repository here: LICENSE.
Geoscience Information Team, Geological Survey of Queensland, Department of Resources, Brisbane, QLD, Australia, geological_info@resources.qld.gov.au