register_dataset

This Python project enables you to scrape a webpage with Darwin Core archives and index these datasets¹ into the GBIF (GLOBAL BIODIVERSITY INFORMATION FACILITY) data portal.

The code is written in Python 3 and can be modified to fit any service that is based on a RESTful API platform.

Prerequisites needed for the user:

An account on the GBIF portal website http://www.gbif.org/
Editor rights for this account (Please contact the GBIF Secretariat helpdesk@gbif.org)
Affiliation with an existing publishing organization in the GBIF network

Prerequisites needed to run the package:

Beautiful Soup 4 - bs4 is a library for parsing HTML and XML files. https://pypi.python.org/pypi/beautifulsoup4
Requests, a library that deals with HTML requests and handles url encoding

¹The assumption is that these DwC archives are validated and fit for GBIF consumption.

Workflow

For the purpose of publishing to GBIF you will need an API url, the url for the web page containg the zip files, the GBIF publisher UUID, the GBIF publisher installation UUID, as well as your username/password for the portal.

To register and index the datasets that are ready on a webpage, you only need to do this:

api_url = 'http://api.gbif-uat.org/v1/dataset'
url = "http://asnhc.angelo.edu/archives/"
#URL for the webpage containg the DwC archives

register_datasets(api_url, url, "afafe88e-4b8e-4e62-8f38-3eaa24f71532", "9c0a8aa8-4ce7-49ba-aac7-21a97234f886", 'myuser', 'mypassword')

jlegind/register_dataset

register_dataset

1The assumption is that these DwC archives are validated and fit for GBIF consumption.

Workflow

¹The assumption is that these DwC archives are validated and fit for GBIF consumption.