django-daiquiri/daiquiri

IMPROVEMENT: add detached-header datalink semantic to the oai adapter

Closed this issue · 9 comments

Add the #detached-header semantic to the oai adapter so it can be digested by the oai metadata. Ultimately, it's the same implementation as for the #documentation but it allows to use datalink with a wider range of semantics without being confined by the oai adapter capabilities.
https://www.ivoa.net/rdf/datalink/core/2022-01-27/datalink.html#detached-header

for access_url, description, semantics, content_type, content_length in rows:
if semantics == '#doi':
datalink['doi'] = get_doi(access_url)
datalink['title'] = description
elif semantics == '#this':
datalink['formats'].append(content_type)
datalink['related_identifiers'].append({
'related_identifier': access_url,
'related_identifier_type': 'URL',
'relation_type': 'IsDescribedBy'
})
elif semantics == '#documentation':
datalink['alternate_identifiers'].append({
'alternate_identifier': access_url,
'alternate_identifier_type': 'URL'
})
elif semantics == '#preview':
datalink['related_identifiers'].append({
'related_identifier': access_url,
'related_identifier_type': 'URL',
'relation_type': 'IsSupplementedBy'
})
elif semantics == '#auxiliary':
datalink['related_identifiers'].append({
'related_identifier': access_url,
'related_identifier_type': 'URL',
'relation_type': 'References'
})
return datalink

As an example for the added code

            elif semantics == '#detached-header':
                datalink['alternate_identifiers'].append({
                    'alternate_identifier': access_url,
                    'alternate_identifier_type': 'URL'
                })

The alternative solution would be putting the #detached-header into the relatedIdentifiers instead

            elif semantics == '#detached-header':
                datalink['formats'].append(content_type)
                datalink['related_identifiers'].append({
                    'related_identifier': access_url,
                    'related_identifier_type': 'URL',
                    'relation_type': 'IsSupplementedBy'
                })

@kimakan That is a good question, I am not quite sure what the best option should be. You have a better understanding of datacite than me, what do you think would be more relevant?

After looking into the issue in more detail, I think that a alternateIdentifier is more appropriate since it's pointing to the same resource essentially. AFAIK, the relatedIdentifier should point to a different, related resource.
However, I would like to put the content_type into the formats to keep track of the alternative formats (currently, only the format of #this is tracked).

            elif semantics == '#detached-header':
                datalink['formats'].append(content_type)
                datalink['related_identifiers'].append({
                    'alternate_identifier': access_url,
                    'alternate_identifier_type': 'URL'
                })

sounds sensible, please make a PR. I like the idea of keeping track of the format. And I agree with the arguments on alternate vs. related.

Alternate identifier is suppose to be an ID.
Suggestion: declare datalinkID there like

<alternateIdentifier "alternateIdentifierType"="datalink">datalinkID</>

Related identifier links to related resources like:
#preview (viewer): describes
#preview-image (related image): is suplemented by
#documentation (url to docs): is documented by
#auxilliary (url to relate dresources): raus of OAI
#detach-header (url to header file): is supplemented by
#this (url of the resource): IsDescribedBy

#progenitor (url(datalink) of resources used): IsDerivedFrom

potential extra semantics:
#auxilliary-table (table with further data): references

Additional note:
Currently, the title of the oai record generated from the datalink tables is rendered from the description of the datalink entry with #doi. It's sensible, but it should be ensured that the description is related to the object and not to the DOI itself.
Incorrect description: Digital object identifier (DOI) for the Table 1 from the Data Release 1
Correct description: Table 1 from the Data Release 1

I found a bug in the creation routine of the tap_schema.datalink. The content_length adopted from the custom datalink tables, e.g., datalink_doi, are set to 0 if the value is None which is incorrect. The value is allowed to be None. In some cases it must be 'Noneif thecontent_length` attribute doesn't make any sense.

def get_datalink_links(self, row):
return [
{
'ID': self.get_datalink_identifier(row),
'access_url': row[2] or '',
'service_def': row[3] or '',
'error_message': row[4] or '',
'description': row[5] or '',
'semantics': row[6],
'content_type': row[7] or '',
'content_length': row[8] or 0

Correctly, the content_length of the datalinks created automatically for all schemas and tables is set to None.

if schema.doi:
schema_links.append({
'ID': identifier,
'access_url': get_doi_url(schema.doi),
'service_def': '',
'error_message': '',
'description': 'Digital object identifier (DOI) for the {} schema'.format(schema),
'semantics': '#doi',
'content_type': 'application/html',
'content_length': None
})