Data Rights Protocol v.0.5
DRAFT FOR COMMENT: Visit the Data Rights Protocol home page for details on our Data Rights Roundtable on October 19th, 2021. To provide feedback on this draft protocol, make a new issue or pull request in this repository or you may provide feedback through this form: https://forms.gle/YC7nKRs3ZQMWLvw27.
Protocol Changes from 0.4 to 0.5:
- new request state "denied/
too_many_requests
" - openapi.yaml specification for PIP server interface
- encode time-extensions in to the request status, along with a
processing_details
field - draft minimal implementation guide
- draft PIP certification/conformance suite
- respecification of identity tokens to match OIDC Core 1.0
1.0 Introduction
This specification defines a web protocol encoding a set of standardized request/response data flows such that Users can exercise Personal Data Rights provided under regulations like the California Consumer Privacy Act, General Data Protection Regulation, and other regulatory or voluntary bases, and receive affirmative responses in standardized formats.
We aim to make the data rights protocol integrable with an ecosystem of data rights middlewares, agent services, automation tool kits, and privacy-respecting businesses which empower and build trust with consumers while driving the cost of compliance towards zero.
1.01 Motivation
Data Rights are increasingly becoming universal, but the method of request and protocol for communicating those requests varies and there is no universal interchange format. Companies operating under these regulatory regimes face not only technical challenges in collecting and delivering responses to users’ data rights requests but also face significant process burdens as consumers increasingly make use of these rights. At the same time, consumers find it tough to execute their data rights under new privacy laws, partially due to a lack of standardization among companies.
By providing a shared protocol and vocabulary for expressing these data rights, we aim to minimize the administrative burdens on consumers and businesses while providing a basis of trust for verifiable identity attestation which can be used by (individual) consumers (or by an agent intermediating the relationship on behalf of consumers) and businesses.
1.02 Scope
In this initial phase of the Data Rights Protocol, we want to enable a group of peers to form a voluntary trust network while expanding the protocol to support wider trust models and additional data flows.
Version 0.5 encodes the rights as specified in the California Consumer Privacy act of 2018, referred herein as the “CCPA”. This is further enumerated in the Supported Rights Actions section of this document below.
1.03 Terminology
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
-
User is the individual who is exercising their rights. This User may or may not have a direct business relationship or login credentials with the Covered Business.
-
User Agent (UA) is the application, software, or browser which is used by the User to mediate their interaction with the Data Rights Protocol.
-
Authorized Agent (AA) is a natural person or business entity that a User has authorized to act on their behalf to exercise the rights encoded in this protocol
-
Authorized Agent Interface (AAi) is the software component managed by an Authorized Agent to accept "Data Rights Status Callback" endpoint calls.
-
Privacy Infrastructure Provider (PIP) is a business entity which provides software and process automation to enabled Covered Businesses to receive and process Data Rights Requests.
-
PIP Interface (PIPi) is the software component managed by a PIP which is responsible for providing the endpoints specified in sections 2.02, 2.03, and 2.05. In cases where the Covered Business is operating without a PIP, these components will be operated by the Covered Business
-
Covered Business (CB) is the business entity which the User is exercising their rights with.
-
Covered Business Interface (CBi) is the software component managed by a Covered Business to provide the Data Rights Discovery endpoint and MAY also provide services for user identity verification.
2.0 HTTP Endpoint Specification
[note about including schemas by-reference from below.]
DRP 0.5 implementors MUST support application/json request and response bodies.
[expand endpoints with their failure states]
GET /.well-known/data-rights.json
("Data Rights Discovery" endpoint)
2.01 This is the Data Rights Discovery endpoint, responding at a well-known endpoint on the Covered Business’s primary User focused domain. This [RFC8615] URI will return a JSON document conforming to this schema. This endpoint exists for Users and Authorized Agents to be able to take Data Rights Actions.
For instance, an User looking to exercise their data rights for Example, Inc. whose homepage is https://example.com/ MUST be able to GET https://example.com/.well-known/data-rights.json without knowledge of the Covered Business’s relationship to any Privacy Infrastructure Provider.
{
"version": "0.5",
"api_base": "https://example.com/data-rights",
"actions": ["sale:opt-out", "sale:opt-in", "access", "deletion"],
"user_relationships": [ ... ]
}
version
field is a string carrying the version of the protocol implemented. Currently this MUST read "0.5"api_base
field is a URI under which the rest of the Data Rights Protocol is accessible. This endpoint MAY be run by a Privacy Infrastructure Provider but SHOULD be accessible under the Covered Business's domains for legibility's sake.actions
is a list of strings enumerating the rights which may be exercised, as outlined in Supported Rights Actionsuser_relationships
is a list of strings enumerating the contexts by which a User may have a relationship with the Covered Business. The enumeration of possible relationships is left unspecified and future versions of the protocol may have more to say about them.
POST /exercise
("Data Rights Exercise" endpoint)
2.02 This is the Data Rights Exercise endpoint which Users and Authorized Agents can use to exercise enumerated data rights.
{
"meta": {
"version": "0.5"
},
"regime": "ccpa",
"exercise": [
"sale:opt-out"
],
"relationships": ["customer", "marketing"],
"identity": <jwt>... ,
"status_callback": "https://dsr-agent.example.com/update_status"
}
meta
MUST contain only a single keyversion
which contains a string referencing the current protocol version “0.5”.regime
MAY contain a string specifying the legal regime under which the Data Request is being taken. Requests which do not supply aregime
MAY be considered for voluntary processing.- The legal regime is a system of applicable rules, whether enforceable by statute, regulations, voluntary contract, or other legal frameworks which prescribe data rights to the User. See 3.01 Supported Rights Actions for more discussion.
exercise
MUST contain a list of rights to exercise.relationships
MAY contain a list of string 'hints' for the Covered Business signaling that the Covered Business may have data of the User's outside of the expected Customer/Business relationship, and which the User would like to be considered as part of this Data Rights Exercise.identity
MUST contain an RFC7515 JWT conforming to one of the following specifications:- a string containing a JWT serialized in the Compact Serialization format [RFC7515 Section 3.1]
- a document object containing a JWT serialized in the JSON Serialization formation [RFC7515 Section 3.2]
- See section 3.04 regarding identity encapsulation.
status_callback
MAY be specified with a URL that the Status Callback can be sent to. See "Data Rights Status Callback" endpoint.
[XXX] is exercise a list? is making multiple "requests" in a single request valid?
POST /exercise
Response
2.02.1 Responses to this request MUST adhere to the Exercise Status Schema.
GET /status
("Data Rights Status" endpoint)
2.03 This is the Data Rights Status endpoint which Users and Authorized Agents can use to query for the status of an existing data rights request. Requests to this endpoint MUST provide a single URL parameter request_id which is the Request ID for the Data Rights Request.
GET /status?request_id=c789ff35-7644-4ceb-9981-4b35c264aac3
GET /status
Response
2.03.1 Responses to this request MUST adhere to the Exercise Status Schema.
POST $status_callback
("Data Rights Status Callback" endpoint)
2.04 The Status Callback endpoint SHOULD be implemented by Authorized Agents which will be exercising data rights for multiple Users. This endpoint exists to remove the need for Authorized Agents to query the Data Rights Status endpoint and instead allow a “push model” where AAs are notified when a request's status changes. The destination for a Status Callback URL is specified in the initial Data Rights Exercise request.
The request body MUST adhere to the Exercise Status Schema.
POST $status_callback
Response
2.04.1 Covered Business SHOULD make a best effort to ensure that a 200 status is recorded for the most recent status update. The body of the callback's response SHOULD be discarded and not be considered for parsing by the Covered Business.
POST /revoke
("Data Rights Revoke" endpoint)
2.05 An Authorized Agent SHALL provide Users with a mechanism to request cancellation of an open or in progress request by sending a Data Rights Revoke request with the following body parameters:
{
"request_id": "c789ff35-7644-4ceb-9981-4b35c264aac3",
"reason": "i don't want my account deleted"
}
Requests to this endpoint contain a single field:
request_id
MUST contain the ID of the request to revokereason
MAY contain a user provided reason for the request to be not processed.
POST /revoke
response
2.05.1 Responses to this request MUST adhere to the Exercise Status Schema. Responses MUST contain the new state.
3.0 Data Schemas
These Schemas are referenced in Section 2 outlining the HTTP endpoints and their semantics.
3.01 Supported Rights Actions
These are the CCPA rights which are encoded in v0.5 of the protocol:
Regime | Right | Details |
---|---|---|
ccpa | sale:opt_out |
RIGHT TO OPT-OUT OF SALE |
ccpa | sale:opt_in |
RECONSENT OR OPT-IN TO DATA SALE |
ccpa | deletion |
RIGHT TO DELETE |
ccpa | access |
RIGHT TO KNOW |
ccpa | access:categories |
RIGHT TO KNOW[☆] |
ccpa | access:specific |
RIGHT TO KNOW[☆] |
Covered Businesses specify which rights they support in the Data Rights Discovery endpoint while consumers and their agents can specify the rights they are making use of in the Data Rights Exercise endpoint.
Requests to exercise these rights SHALL be made under either a processing regime
of "ccpa", or on a voluntary basis by leaving the regime unspecified. The encoding of CCPA rights in this section is not to be interpreted to exclude requests made under GDPR statutes or other regional privacy or accessibility legislation; other legal regimes shall be encoded in to the protocol in future iterations.
[☆] The schema and semantics of the access:categories
and access:specific
rights shall be declared at a later date. Discussion in GitHub issue #9.
3.02 Request Statuses
This table shows valid states for Data Rights Requests, along with the criteria for transition into each state. Further, this table shows at which states certain fields are allowed to be added to a data rights request.
"Final" states are marked in the final field of the table. Requests which enter final state MAY be disregared after the lesser of the expires_at
flag or 60 days, but no less than 7 days from when expiration was first specified.
state | reason | entrance criteria | new fields | Final? |
---|---|---|---|---|
user has created request, but not submitted it | base fields | |||
open | user has submitted request to Data Rights Endpoint[1] | request_id | ||
in_progress | CB has acknowledge receipt of request OR User solves verification | received_at, expected_by, processing_details | ||
in_progress | need_user_verification | CB doesn't have sufficient ID verification | user_verification_url, expires_at | |
fulfilled | CB has finished data rights request process | results_url, expires_at | x | |
revoked | user has explicitly actioned to revoke the request | x | ||
denied | suspected_fraud | CB or PIP believes this request was made fraudulently | processing_details | x |
denied | insuf_verification | the [in_progress, need_user_verification] stage failed or timed out | processing_details | x |
denied | no_match | CB could not match user identity to data subject | processing_details | x |
denied | claim_not_covered | user requesting data not covered under legal bases[XXX] | processing_details | x |
denied | outside_jurisdiction | user requesting data under bases they are not covered by[XXX] | processing_details | x |
denied | too_many_requests | user has submitted more requests than the CB is legally obliged to process | details? | |
denied | other | some other unspecified failure state reached | processing_details | x |
expired | the time is currently after the expires_at in the request. |
x |
[XXX] in the case of claim_not_covered, this may be about asking for categories of data which Covered Businesses are not required to present to the User. in the case of outside_jurisdiction, this may be because the business is not honoring CCPA requests for non-California residents and there is no other basis on which to honor the request. #28 for discussion on too_many_requests
need_user_verification
State Flow Semantics
3.02.1: When a Data Rights Request enters the in_progress
/need_user_verification
state, the PIPi SHALL inform the Agent through either the Data Rights Status Callback or the Data Rights Status endpoint. A Data Rights Request can enter this state if the identity tokens are not already sufficiently verifiable by the Covered Business, or they could not unambiguously match the User to an account based on those tokens.
These request statuses MUST contain a user_verification_url
string which is an HTTPS or otherwise secure URL; the user's identity token will be included in requests to that URL. The Authorized Agent is responsible for presenting the URL in the Status's user_verification_url
with some URL parameters attached to it:
identity
MUST contain the same identity token presented in the original Data Rights Request, or a JWT containing the same claimsredirect_to
MUST contain a URL-safe encoded URL which the PIPi will redirect to upon successful identity verification.request_id
MUST contain therequest_id
of the Data Rights Request under consideration.
The PIPi SHOULD provide a user_verification_url
which refers to a unique Data Rights Request and then SHALL verify that the request_id
specified by the Authorized Agent refers to the same Data Rights Request before presenting a verification.
The PIPi SHOULD NOT redirect the user back to the Authorized Agent's redirect_to
URL when the user verification fails or is canceled, but the Authorized Agent SHOULD NOT assume that loading that URL is enough to assume the verification is complete and request is ready to proceed; they should query the Data Rights Status endpoint or wait for a Status callback.
3.03 Schema: Status of a Data Subject Exercise Request
A single JSON object is used to describe any existing Data Exercise Request and is referred to as the Exercise Status object:
{
"request_id": "c789ff35-7644-4ceb-9981-4b35c264aac3",
"received_at": "20210902T152725.403-0700",
"expected_by": "20211015T152725.403-0700",
"processing_details": "this user has many records",
"status": "in_progress",
"reason": "need_user_verification",
"user_verification_url": "https://example.com/data-rights/verify/c789ff35-7644-4ceb-9981-4b35c264aac3"
}
request_id
MUST contain a string that is the globally unique ID returned in the initial Data Rights Exercise request.[1]status
MUST contain a string which is one of the request states as defined in Request Statuses.reason
MAY contain a string containing additional information about the current state of the request according to the Request Statuses.received_at
SHOULD contain a string which is the RFC 3339-encoded time which the initial request was registered by the Covered Business.- When a Covered Business receives a request, this field MUST be present.
expected_by
SHOULD contain a date before which the Authorized Agent can expect to see an update on the status of the Data Rights Request. This field should conform to the legal regime's deadline guidances, and may be amended by the PIP or Covered Business according to those same regulations.processing_details
MUST be updated to reflect the reason for this extension.processing_details
MAY contain a string reflecting the state of the Data Rights Request so that the Agent may communicate this state to the End User.user_verification_url
MAY contain a URI which can be presented in a User Agent for identity verification.expires_at
MAY contain an [RFC 3339]-encoded time after which time the Covered Business will no longer oblige themselves to record-keep the request under consideration.
[1]: request_id
SHOULD be an UUID generated by the Covered Business or Privacy Infrastructure Provider immediately. This request_id
SHOULD NOT be taken as an assumption that the Covered Business has received and is acting on the request, simply that the "middle layer" between has. If the Data Rights endpoints are operated directly by the Covered Business, requests SHOULD pass immediately from open
to in_progress
.
3.04 Schema: Identity Encapsulation
In development of this protocol a simple question with complex answers is raised repeatedly: how do we securely encode a user's identity in a way that is trustworthy to all implementing parties? This has traditionally been done in an ad-hoc fashion. In the scope of a Data Rights Protocol, this can be seen as a barrier to exercise: if a consumer has 50 companies they would like to access their data from, they should not need to complete 50 identity verification processes to exercise those rights. To that end, the parties implementing the protocol have spent some time researching the state of the art and the wider identity ecosystem and come to the following set of conclusions:
- OpenID Connect solves much of the questions around federated identity and trust, and the biggest barriers to uptake are implementation complexity and relatively early rollout of the technology. This, however, is the broad direction that federated identity on the web appears to be headed and we intend to follow that current.
- for Version 1 of the protocol the focus of development is on developing endpoints, defining the data structure of requests, defining end to end state transitions of the requests, and development of non-technical processes around this protocol.
- Implementers will work with minimal technical trust mechanisms and instead rely on an operating agreement between implementers deploying this protocol in production. See System Rules documentation. [XXX: this is still being drafted]
- Working Group intends to explore best practices in federated identity and emerging technologies like OpenID Identity Assurance (eKYC).
Given the long-term goal of supporting OpenID Connect, protocol implementers SHALL encapsulate identity using RFC7515-encoded JSON Web Tokens. Tokens can either be represented in their Compact Serialization or JSON Serialization representations; for complex JWTs containing sub-tokens (consider an Authorized Agent with a set of self-attested claims alongside a Covered Business-provided identity token), the JSON Serialization should be considered preferred, but for simple tokens compact serialization should be accepted. (these are purposefully NOT RFC1191 SHOULDs...)
Subject to further refinement of trust mechanisms and authorization workflow, JWTs MAY contain custom claims, and contain the following OIDC Standard Claims:
name | type | description |
---|---|---|
aud |
str | audience claim MUST contain reference to the legal entity which is responsible for processing rights action |
sub |
str | if known, subject claim SHALL contain the Covered Business's preferred public identifier for the user, for example a user-name or account email address. |
name |
str | if known, claim SHALL contain the user's full name most likely known by the Covered Business |
email |
str | if known, claim SHALL contain the user's email address. |
email_verified |
bool | if the user's email address has been affirmatively verified according to the level of assurance specified in the System Rules, this field SHALL be specified as true |
phone_number |
str | if known, claim SHALL contain the user's phone number in E.164 encoding. |
phone_number_verified |
bool | if the user's phone number has been affirmatively verified according to the level of assurance specified in the System Rules, this field SHALL be specified as true |
address |
address | if known, claim SHALL contain the user's preferred address. This claim is specified fully in OpenID Connect Core 1.0 section 5.1.1 |
address_verified |
bool | if the user's address has been affirmatively verified according to the level of assurance specified in the System Rules, this field SHALL be specified as true |
power_of_attorney |
str | this custom claim MAY contain a reference to a User-signed document delegating power of attorney to the submitting AA. Implementation details of this claim will be defined later. |
Covered Businesses SHALL determine for themselves the level of reliance they will place on a given token. Authorized Agents SHALL make reasonable efforts to provide trustworthy tokens, by verifying user-attested claims according to the practices agreed under the System Rules, by attaching user-attested claims as available, and by ensuring their JWTs are signed by a key which the Covered Businesses and PIPs can verify against.
[XXX] JWTs should probably not be encrypted? managing the encryption key exchange here is very strange and necessarily happening out-of-scope of the protocol. but we already have shared-secret API authentication in section 3.07; I am queasy about having user tokens identity floating in the open here....
[XXX] discussion about what happens in need_user_verification
stage; it would be nice if the UA could specify a redirect URL to return back to the app.
3.06 Error States
Servers SHALL respond with HTTP 200 response codes when requests are processed successfully. In exceptional cases, servers SHALL respond with non-200 response codes and an application/json
body with the following keys:
code
MUST contain a string encoding of the HTTP response code for clients which cannot process the headers.message
MUST contain a string explaining the nature of the error.fatal
MAY contain a Boolean value oftrue
if the request will move to adenied
/other
state. Requests which are notfatal
shall be assumed to be retryable.
{
"code": "400",
"message": "Unsupported rights actions submitted."
}
PIPi servers MAY signal that an existing request will no longer be processed due to this error. PIPi SHOULD move the request to a denied
/other
state and call the Status Callback endpoint accordingly.
Error codes are purposefully under-specified at the moment -- servers SHALL make a best effort to map to known 4XX and 5XX codes.
Note that these error states only represent request errors; workflow errors SHOULD be specified in the request status fields.
3.07 API Authentication
In short:
- for v.0.5 we specify that client shared secrets will be used for authentication to all endpoints except the Data rights Discovery endpoint.
- Participating parties will need to exchange shared secrets out of band for now
- the intention is to eventually leverage OAuth2 to secure these resources, either in concert with OIDC or out of band
- Each party MUST include an HTTP
Authorization
header in each response containing the SHA-512 hash of their secret. - Requests which do not have an
Authorization
header MUST receive an401
HTTP response.
3.08 Processing Extensions & "Expected By" dates
When a Covered Business acknowledges receipt of a Data Rights Request and moves it in to in_progress
state, the request's expected_by
field SHOULD be populated based on either an estimate provided by the Covered Business or the deadline prescribed by the legal regime the request is submitted under. Consider, for example, California's legal regime prescribes up to 90 days extension so long as they are made within 45 days; If a request is extended, this request must also be extended with a processing_details
field detailing a reason for the Request's extension to notify the consumer of this delay. The intent of the processing_details
field is to add additional color to already-defined state
/reason
combinations. States which cannot be encoded without reaching for free-form text should be integrated in to the state transition table.
When applying changes to Data Rights Requests in this fashion, the Privacy Infrastructure Provider SHALL attempt to notify the Authorized Agent using the Data Rights Status Callback.
4.0 Protocol Roadmap
In its current implementation, DRP should not be used to process data of Users who are not involved in the implementers group. There are known shortcomings in security, privacy, and identity verification that will need to be solved before a "1.0" protocol version which is suitable for production-ready systems.
- Governance and operating model
- Protocol Compliance suite
- OIDC identity provider flows
- Secure OAuth2 client authentication (eg FAPI baseline security profile)
- Successful DRP pilot between multiple Agents and Covered Businesses
Specification Change Log
In general, please put major change log items at the top of the file. When a new protocol version is "cut", move the previous versions' change log down here.
Protocol Changes from 0.3 to 0.4:
- relationship hints allow users and agents to provide "hints" for the type of customer relationship, or a set of subsidiary brands to query.
- shift in language from regulatory framework to broader legal bases
- medium-term protocol development road-map
Changes in v0.2 to v0.3:
- donotsell -> sale:opt-in opt-out
- terminology changes
- Request status chart
- identity tiger team recommendations
- API Authentication details
- Moved non-essential sections out of protocol spec