AI: EU AI Act: data original purpose, model intended task, and system intended purpose

Question

AI: EU AI Act: data original purpose, model intended task, and system intended purpose

Opened this issue 10 months ago · 0 comments

Background

EU AI Act [1] will require the AI providers to provide information about intended purpose/intended task of the system/model they place on the market. If there's a use of personal data in training/validation/testing data sets, data original purpose should also be provided.

High-risk AI system provider information obligations (per Article 16(a)):
- Data original purpose - Article 10(2)(aa)
- System intended purpose - Article 11(1), detailed in Annex IV(1)(a)
General purpose AI (GPAI) model provider information obligations:
- Model intended task - Article 52c(1)(b)(ii), detailed in Annex IXb(1)(a))

Example:

A GPAI model A intended task is facial recognition.
A high-risk AI system B intended purpose is user authentication.
The system B can use the model A to perform a facial recognition task to fulfill its authentication purpose.

Relevant fields in 3.0

primaryPurpose and additionalPurpose properties in SoftwareArtifact class of Software Profile provide information about the purposes of the software artifact. The purpose can be entries from SoftwarePurpose (for examples, "configuration, data, executable, library, model").
domain property in AIPackage class of AI Profile describes "the domain in which the AI model contained in the AI software can be expected to operate successfully. Examples include computer vision, natural language etc."
intendedUse property in Dataset class of Dataset Profile describes "what the given dataset should be used for." "if a dataset is collected for building a facial recognition model, the intendedUse field would specify that."

Possible gaps and proposal

System intended purpose

SoftwareArtifact primaryPurpose and additionalPurpose are for purposes of the element within the system, not purposes of the system.
Need a property for system intended purpose. "System" in this case could be a Package (distribution of software).

Model intended task

AIPackage domain looks a bit too broad compare to what we're looking for.
From one of the examples given for domain, "computer vision" - a computer vision domain has many tasks: pose estimation, facial recognition, optical character recognition, etc. So domain alone may not sufficient.

Data original purpose

Dataset intendedUse may sufficient for this.

Proposal

It may be possible to use intendedUse property for all three information items mentioned above (system intended purpose, model intended risk, and data original purpose)
We could move intendedUse property from Dataset class in Dataset Profile to Package class in Software Profile.
Then add that property to AIPackage class in AI Profile and Dataset class in Dataset Profile.
If package intendedUse or model intendedUse is different from data intendedUse, the data intendedUse is considered a data original purpose
How to make it more convenient for machine to understand/parse information inside intendedUse?

Note on "model"

SPDX AIPackage's current description is "Metadata information that can be added to a package to describe an AI application or trained AI model."

So a "model" can be either:

an SoftwareArtifact with primaryPurpose=model (a model file alone); or
an AIPackage (a model with an inference code or similar, as a package?)

For example:

ggml-model-gpt-2-117M.bin is a model file
The code at https://github.com/ggerganov/ggml/tree/master/examples/gpt-2 is an inference code for GPT-2 model

(1)+(2) together can be an AIPackage (a "trained AI model" according to the AIPackage description).

The proposal above may not work with an SoftwareArtifact with primaryPurpose=model, as SoftwareArtifact has neither an intendedUse nor a domain property.

References

[1] Latest draft 2 Feb 2024 https://www.europarl.europa.eu/meetdocs/2014_2019/plmrep/COMMITTEES/CJ40/AG/2024/02-13/1296003EN.pdf