/obis-qc

OBIS QC checks

Primary LanguagePython

tests

obis-qc

Quality checks

The following checks are performed on each occurrence record in order to add quality flags, tag as absence data, or drop records from the main index:

Check Fields Flags Absence Dropped Vandepitte et al. flag number
occurrenceStatus should be present. occurrenceStatus
occurrenceStatus should be absent or present. occurrenceStatus x
If individualCount equals 0, record is absence. individualCount x
basisOfRecord should be present. basisOfRecord
basisOfRecord should be PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon or Occurrence. basisOfRecord
eventDate should be present. eventDate 7, 11, 12, 13
eventDate should conform to ISO 8601. eventDate
eventDate should not be in the future. eventDate DATE_IN_FUTURE
eventDate should be later than the set minimum date (year 0) eventDate DATE_BEFORE_MIN
decimalLongitude and decimalLatitude should be present. decimalLongitude, decimalLatitude NO_COORD x
decimalLatitude and decimalLongitude should not be zero. decimalLongitude, decimalLatitude ZERO_COORD x 4
decimalLatitude and decimalLongitude should be within range. The range of decimalLongitude is between -180 to 180 and the range of decimalLatitude is between -90 and 90. decimalLongitude, decimalLatitude LON_OUT_OF_RANGE, LAT_OUT_OF_RANGE, NO_COORD x 5
coordinateUncertaintyInMeters should be present. coordinateUncertaintyInMeters
coordinateUncertaintyInMeters should be within the range of 0 and 10000000. coordinateUncertaintyInMeters
minimumDepthInMeters and maximumDepthInMeters should be present. minimumDepthInMeters, maximumDepthInMeters NO_DEPTH
minimumDepthInMeters and maximumDepthInMeters should be within the range of -100000 and 11000. minimumDepthInMeters, maximumDepthInMeters MIN_DEPTH_EXCEEDS_MAX
minimumDepthInMeters should not exceeds maximumDepthInMeters. minimumDepthInMeters, maximumDepthInMeters NO_DEPTH
minimumDepthInMeters and maximumDepthInMeters should be less than or equal to the bathymetric depth. minimumDepthInMeters, maximumDepthInMeters DEPTH_EXCEEDS_BATH 19
Is the occurrence located on land? Based on OpenStreetMap land polygons. decimalLongitude, decimalLatitude ON_LAND 6
scientificName should be present. scientificName
scientificNameID should be present. scientificNameID
scientificNameID should be valid WoRMS LSID. scientificNameID 2
Taxon should unambiguously match with WoRMS. scientificName, scientificNameID NO_MATCH x
An accepted name should exist in WoRMS. scientificName, scientificNameID NO_ACCEPTED_NAME
Taxon should not be exclusively freshwater or terrestrial according to WoRMS. scientificName, scientificNameID NOT_MARINE, MARINE_UNSURE in case of NOT_MARINE
measurementType and measurementTypeID should be present if extended Measurement or Fact extension is being used. measurementType, measurementTypeID

Vandepitte et al. flags not implemented: 3, 9, 14, 15, 16, 10, 17, 21-30.

Installation

pip install -r requirements.txt
python setup.py install

Usage

See tests.

Using a cache to speed up taxonomy checks

By default the taxonomy component fetches information from WoRMS the WoRMS API by AphiaID. If you have a local cache of WoRMS information, you can use that instead of the API connection by providing an object that implements the fetch() and store() methods. The WoRMS information objects provided to the cache are constructed like this:

record = pyworms.aphiaRecordByAphiaID(aphiaid)
classification = pyworms.aphiaClassificationByAphiaID(aphiaid)
bold_id = pyworms.aphiaExternalIDByAphiaID(aphiaid, "bold")
ncbi_id = pyworms.aphiaExternalIDByAphiaID(aphiaid, "ncbi")
distribution = pyworms.aphiaDistributionsByAphiaID(aphiaid)
bold_id = bold_id[0] if bold_id is not None and isinstance(bold_id, list) and len(bold_id) > 0 else None
ncbi_id = ncbi_id[0] if ncbi_id is not None and isinstance(ncbi_id, list) and len(ncbi_id) > 0 else None
aphia_info = {
    "record": record,
    "classification": classification,
    "bold_id": bold_id,
    "ncbi_id": ncbi_id,
    "distribution": distribution
}

Run tests

nosetests --with-coverage --cover-package=obisqc --cover-html