RESOStandards/uli-service

Create ULI Pilot Ingest Service

Opened this issue · 0 comments

Due Date: Demo on Jan 7 2022.

In order to onboard ULI Pilot participants and test the matching algorithm, we need to be able to onboard each org one at a time and return the possible duplicates and their confidence scores in a payload.

The matching algorithm that this service uses will also be queried from an interactive search form down the road, and should share the same search functions.

  • ULI Service

Create an endpoint that can ingest new data from a given Provider UOI.

A request to this endpoint would look something like the following:

POST /uli-service/v1/<providerUoi>
{
  "value": [
    {
      "MemberFullName": "",
      "MemberLastName": "",
      "MemberFirstName": "",
      "MemberMiddleInitial": "",
      "MemberNickname": "",
      "MemberType": "",
      "MemberNationalAssociationId": "",
      "MemberStateLicense": "",
      "MemberLicenseType": "",
      "MemberStateLicenseState": "",
      "MemberAddress1": "",
      "MemberCity": "",
      "MemberStateOrProvince": "",
      "MemberPostalCode": "",
      "OfficeName": "",
      "OfficeMlsId": "",
      "OfficeAddress1": "",
      "OfficeCity": "",
      "OfficeStateOrProvince": "",
      "OfficePostalCode": "",
      "OrganizationUniqueId": "",
      "OrganizationName": ""
    }
  ]
}

Response:

  HTTP/2 200 OK
  {
    "status": "Import Queued",
    "numRecords": 134,
    "eventQueueUrl": "/uli-service/v1/<providerUoi>/processing",
    "method": "GET"
  }

Where each item in the value array is a ULI data structure matching the above schema.

As such, schema validation should be performed to ensure that some proper subset of the above dictionary fields is present, and that they are passed in the value array. If true, then respond with a 200, and if not then 400 and tell them which field(s) caused the error. Use ajv or Yup for schema validation.

  • Sync Orgs with UOI Production Sheet

We should also validate UOIs against the reference sheet. This will mean we'll have a cache of orgs running locally on the API server.

Later on, we'll need the ability to refresh from the UOI sheet. We could take this from the Cert API - just the Sync service and UI. We don't need to do this by the Jan 7 demo though. For now, have a singleton service that populates the cache the first time it's used, and then returns what's there until restarted. If we reuse what's in Cert then we can deal with it that way.

  • Event Types: ULI Assigned, ULI Suggested, Processing

Once the ingest job has been started, then if the user makes a request to the queue there won't be records in it right away. It will take some time before there are results. We'll need some kind of notification in the future.

Once each record gets pushed, then we'll run a scavenging job on them against what's already in the system.

In order to do this, we'll dynamically form a query based on the information each record contains according to the matching formula, and be able to support a variable set of weights through a separate index that would eventually have its own UI in production.

In the case where a new ULI can be created or an existing match is suggested, we'll use a format similar to the following:

   urn:reso:uli:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

which is a uuid that uses the RESO URN namespace (3.4.3).

The API will classify the matching events into event types such as ULI Assigned, ULIs Suggested.

  • ULI Assigned - in this case, a ULI was assigned since there was no matching record within the configurable confidence threshold. A confidence score will be shown for the item including the fields that were matched on. A ULI can also be assigned through the review process.
  • ULI Suggested - for each inbound record there may be one or more suggested ULIs pertaining to that record. They will also be shown with their confidence scores and which fields they match on.
  • Processing - we will also likely want access to the records currently being processed as well, so there should be a third event type of Processing, but it shouldn't be returned unless asked for.

There may be other events added in the future, but this is a good start.

As such, we'll want the root path to also take an optional eventType parameter:

  GET /uli-service/v1/<providerUoi>[/<uli-assigned|possible-match|processing>]

  {
    "value": [
      {
        "EventKey": "uniqueKey1",
        "EventType": "ULI Assigned",
        "ULI": "urn:reso:uli:f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
        "OriginatingSystemUsi": "<providerUsi>",
        "PathToOriginalRecord": "/uli-service/v1/<providerUoi>/ingest/<key>",
        "Score": 0.89,
        "ScoringFactors": {
           "MemberFullName": 0.55,
           "MemberNationalAssociationId": 0.34
        },
        "ModificationTimestamp": "2022-01-04T00:39:56Z"
      }, {
        "EventKey": "uniqueKey2",
        "EventType": "Possible Match",
        "ULI": "urn:reso:uli:f81d4fae-7dec-11d0-a765-00a0c91e6b00",
        "OriginatingSystemUsi": "<providerUsi>",
        "PathToOriginalRecord": "/uli-service/v1/<providerUoi>/ingest/<key>",
        "Score": 0.99,
        "ScoringFactors": {
           "MemberFullName": 0.85,
           "MemberNationalAssociationId": 0.14
        },
        "ModificationTimestamp": "2022-01-03T00:39:56Z"
      }, {
        "EventKey": "uniqueKey3",
        "EventType": "Possible Match",
        "ULI": "urn:reso:uli:f81d4fae-7dec-11d0-a765-00a0c91e6b00",
        "OriginatingSystemUsi": "<providerUsi>",
        "PathToOriginalRecord": "/uli-service/v1/<providerUoi>/ingest/<key>",
        "Score": 0.82,
        "ScoringFactors": {
           "MemberStateLicenseState": 0.40,
           "MemberStateLicense": 0.33,
           "FullName": 0.09
        },
        "ModificationTimestamp": "2022-01-03T00:39:56Z"
      }
    }
  ]
}
 

Where not specifying a type returns all items except Processing.

The ScoringFactors data should be accessible by having Elastic explain each query. It's not required for MVP though, just a nice to have.