Django based search for the Canadian Open Government Portal
Version 1.0
This project uses a Django framework web application as a thin frontend to Solr to do searching of datasets and proactive disclosure data for the Open Government Portal. Instead of using default CKAN Solr 6 cores, data is loaded into custom Solr cores that customized specifically to support Canada's two official languages and fast data exprting.
OGC Search is a Django 2.2 application that runs on Python 3.6 or higher. It works with Solr 6.6.x or 8.5.x.
Clone the GitHub OGC Search project: https://github.com/open-data/ogc_search. Create a new
python 3.7 virtual environment for the project and install the requirements from the
requirements.txt
file.
pip install -r requirements.txt
OGC Search is built using Django 2.2. Familiarity with Django is prerequisite to developing with the OGC Search.
- NLTK Data OGC Search uses the NLTK python library and requires the Punkt tokenizers which are available from https://www.nltk.org/nltk_data/. These datafiles must be accessible to the Python virtual environment. It is not necessary to download the entire NLTK corpus.
- OGC Search is a bilingual application that uses the standard gettext library to support localisation. On Windows, it will be necessary to install the gettext library.
OGC Search reads information that describes the CKAN datasets and proactive disclosure data types from the ckanext-scheming YAML files. The proactive disclosure files are available on GitHub as are the CKAN dataset files. It also requires two JSON files for international currency and country codes. These files are often copied to the /ckan folder, but the location can be configured in the settings.py file.
OGC Search uses the Centrally Deployed Templates Solution (CDTS) to provide the Canada.ca theme. The locaiton of the fallback files for CDTS can be configured in the settings.py file.
Download and install Apache Solr 8.5.x or alternatively download an older version 6.6.x from the Apache repo. Follow the Solr installation instructions.
After installing Solr, create at least one new Solr core for the default search. Once the core has been created, customize it for OGC Search.
- As the
solr
user create a new core:solr -c <core_name> create
- In the /conf folder of the new Solr core, remove the file
managed-schema
and copy the newschema.xml
andsolrconfig.xml
from the corresponding search application project Solr folder. Be sure to use the schema from the appropriate version folder. - Copy the
/lang
folder from the project to the new Solr core's/conf
folder - Verify the new core is working using the Solr admin interface
The unique index for each Solr core, matches as closely as possible the datastore_primary_key
field from
the corresponding CKAN YAML file.
Data Type | CKAN | Search |
---|---|---|
Briefing Note | tracking_number | owner_org,tracking_number |
Contracts | reference_number | owner_org,reference_number owner_org,reporting_period |
Experimental Inventory | reference_number | owner_org,reference_number |
Grants and Contributions | ref_number,amendment_number | owner_org,ref_number,amendment_number owner_org.fiscal_year,quarter |
NAP | reporting_period,indicators | owner_org, reporting_period,indicators |
Question Period Briefing Notes | reference_number | owner_org,reference_number |
Service Inventory | fiscal_yr,service_id | owner_org,fiscal_yr,service_id |
Open Data | name | name (Package UUID) |
OGC Search has a large number of static files. As per Django, these files are collected from each project and in development mode can be served up by the Django server. These files go into a /static folder that often is created in the root of the project file for development, but this can be configured as desired.
As is usual in Django, application settings are stored in a settings.py
file that is saved to the project folder /ogc_search/ogc_search/settings.py.
An example settings files is provided: /ogc_search/ogc_search/settings.sample.py
.
The open data Solr search core is populated by CKAN, however for all the proactive disclosure searches, 'contracts' for example, the Solr core is populated by a script that reads the CKAN recombinant CSV output file for the corresponding proactive disclosure type and saves the data to the search optimized core.
Controlled list values for the proactive disclosure data is read from the corresponding YAML table definition file.
OGC Search uses the binary data export feature of Solr to perform fast and efficient export or search results to a CSV file.
OGS provides a link to Solr's similarity search for Open Data. To retrieve a simple HTML fragment with a list of ten similar datasets, use the URL pattern < OGS Site >/en/od/mlt/<UUID> where UUID is the dataset ID of the original record. For example: http://127.0.0.1:8000/en/od/mlt/59570050-dc7f-408d-9e41-6d2c4d16a768.
Unless otherwise noted, the source code of this project is covered under Crown Copyright, Government of Canada, and is distributed under the MIT License.
The Canada wordmark and related graphics associated with this distribution are protected under trademark law and copyright law. No permission is granted to use them outside the parameters of the Government of Canada's corporate identity program. For more information, see Federal identity requirements.