ga4gh-discovery/ga4gh-service-info

Explain how to extend the specification by other specifications

mcupak opened this issue · 13 comments

E.g. Beacon wants to have an image URL for organizations.

The consensus seems to be to encourage a single top-level extension object for everything, incl. organizationImageUrl.

Just tried this in refget as I've never done inheritance using OpenAPI before. Works really well!

    RefgetServiceInfo:
      allOf:
      - '$ref': https://raw.githubusercontent.com/ga4gh-discovery/ga4gh-service-info/develop/service-info.yaml#/components/schemas/ServiceInfo
      - type: object
        description: Extended capabilities of refget service info
        properties:
          extension:
            type: object
            required:
              - circular_supported
              - algorithms
            properties:
              circular_supported:
                type: boolean
                description: Indicates if the service supports circular location queries
              subsequence_limit:
                type: integer
                format: int64
                nullable: true
                description: Maximum length of sequence which may be requested using start and/or end query parameters
              algorithms:
                type: array
                items:
                  type: string
                  description: An array of strings listing the digest algorithms that are supported
                  example:
                  - md5
                  - trunc512

However the example response looks like

{
  "id": "org.ga4gh.service",
  "name": "1000 Genomes Project",
  "type": "org.ga4gh:beacon:1.0.1",
  "description": "The 1000 Genomes Project provides a large public catalogue of\nhuman genome variation data which serves as standard reference\nin many genomic workflows and analysis projects.\n",
  "documentationUrl": "https://docs.example.com",
  "organization": "EMBL-EBI",
  "contactUrl": "mailto:support@example.com",
  "version": "1.0.0",
  "extension": {
    "circular_supported": true,
    "subsequence_limit": 0,
    "algorithms": [
      [
        "md5",
        "trunc512"
      ]
    ]
  },
  "createdAt": "2019-06-04T12:58:19Z",
  "updatedAt": "2019-06-04T12:58:19Z"
}

We should make the OpenAPI more generic in the examples so it doesn't look this odd when we integrate it into 3rd party applications

@rishidev I think we need an entry for service info's OpenAPI spec at https://github.com/ga4gh/w3id.org/blob/master/ga4gh/.htaccess to help this 3rd party integration

This is good, and works well in the service-registry too.

The problem we have is that we need to override the whole extension field rather than just add fields into it. This isn't just a theoretical issue. For example, we know for a fact that the Beacon specification needs to add custom fields, and Beacon implementers often add custom fields on top of the Beacon spec. As per #10, the Beacon spec needs to put everything in extension. The downstream spec for the implementation (or other dependent spec, such as the Beacon Network) can't just add more things, but needs to overrride the whole extension.

If we allowed arbitrary extending, rather than requiring to put everything in particular field, this would not be an issue. As a non-trivial bonus, this would also allow extension of nested fields, such as organization.

I think this justifies changing how we extend things, but I'd like to discuss on the call today.

There were 3 reasons we decided to enforce a particular field in the first place:

  1. Consistent versioning with respect to backward compatibility. While we can guarantee backward compatibility with both options at the level of the standard (if we're only bumping a minor version, we'd only add fields, not change existing fields), a minor version bump in the spec might force a major version update in an implementation (adding a field someone is already using in a different way).
  2. Easier to distinguish at first side what's a standard field vs. a proprietary extension.
  3. We know the tooling works well with this option.

We've previously explored 3 and found out this is a non-issue - the tooling works well with the alternative approach as well.

2 is subjective, and I'd argue not too important. From consumer's perspective, I don't really care if the field is standard, and if I want to know, I can always check the spec. The service API tells me the version of the spec it's implementing.

As for 1, the service now tells us its version, as well as the version of the spec. There's no reason why these should be versioned in sync, in fact, it's very likely you'll want to update the API of the service a lot more frequently than the spec.

I've looked for some similar situations elsewhere and found a few that support not going with a separate field. For example, HTTP headers have similar standardization and extension requirements. Originally, a decision was made to encourage headers prefixed with X-, but this is now an antipattern for very similar reasons we've debated here. JWT and Auth0 uses the same approach for claims (although they do have a registry for custom claims and namespace custom claims to avoid collisions).

+1 from Andy

Need to retire the extension field from the specification

#50 removes the extension field as agreed upon here

@andrewyatz Feel free to make a PR on w3id.org if needed

@rishidev this has been created. @mcupak take a look at ga4gh/w3id.org#2. The url will be https://w3id.org/ga4gh/discovery/service-info-v1.yaml and currently points to what is on the develop branch (we can switch this later once through PRC).

@andrewyatz A few questions:

  • What is this for/why is it needed?
  • Do we need this for registry as well?
  • Should we set up a release branch and link to that instead?

Quick responses

  • The for/why?
    • The idea is to provide a stable URL through which the service-info YAML file would be made available
    • That URL is intended to live for quite a bit of time (years)
    • It allows us to extend consumer services with service-info without needing to re-write any specs once we go live
    • If repos get modified down the line (moved around etc) then this would protect consumers from those issues
    • If we ever go for v2 then this allows v1 to remain accessible (though this could be provided by using a release branch/tag so it's not a compelling reason)
  • Is it needed for the registry?
    • Could be. The registry is likely to be extended by another implementor right? So the same arguments would suffice
  • Release branch?
    • I'd say yes or at least we should tag at release time and link to that
    • We can link to any branch we like, so this gives us maximum flexibility

Thanks @andrewyatz, makes sense. I filed ga4gh-discovery/ga4gh-service-registry#88 for the registry.