AI: EU AI Act: data original purpose, model intended task, and system intended purpose
Opened this issue · 0 comments
Background
EU AI Act [1] will require the AI providers to provide information about intended purpose/intended task of the system/model they place on the market. If there's a use of personal data in training/validation/testing data sets, data original purpose should also be provided.
- High-risk AI system provider information obligations (per Article 16(a)):
- Data original purpose - Article 10(2)(aa)
- System intended purpose - Article 11(1), detailed in Annex IV(1)(a)
- General purpose AI (GPAI) model provider information obligations:
- Model intended task - Article 52c(1)(b)(ii), detailed in Annex IXb(1)(a))
Example:
- A GPAI model
A
intended task is facial recognition. - A high-risk AI system
B
intended purpose is user authentication. - The system
B
can use the modelA
to perform a facial recognition task to fulfill its authentication purpose.
Relevant fields in 3.0
primaryPurpose
andadditionalPurpose
properties inSoftwareArtifact
class of Software Profile provide information about the purposes of the software artifact. The purpose can be entries fromSoftwarePurpose
(for examples, "configuration, data, executable, library, model").domain
property inAIPackage
class of AI Profile describes "the domain in which the AI model contained in the AI software can be expected to operate successfully. Examples include computer vision, natural language etc."intendedUse
property inDataset
class of Dataset Profile describes "what the given dataset should be used for." "if a dataset is collected for building a facial recognition model, the intendedUse field would specify that."
Possible gaps and proposal
System intended purpose
- SoftwareArtifact
primaryPurpose
andadditionalPurpose
are for purposes of the element within the system, not purposes of the system. - Need a property for system intended purpose. "System" in this case could be a
Package
(distribution of software).
Model intended task
- AIPackage
domain
looks a bit too broad compare to what we're looking for. - From one of the examples given for domain, "computer vision" - a computer vision domain has many tasks: pose estimation, facial recognition, optical character recognition, etc. So
domain
alone may not sufficient.
Data original purpose
- Dataset
intendedUse
may sufficient for this.
Proposal
- It may be possible to use
intendedUse
property for all three information items mentioned above (system intended purpose, model intended risk, and data original purpose) - We could move
intendedUse
property fromDataset
class in Dataset Profile toPackage
class in Software Profile. - Then add that property to
AIPackage
class in AI Profile andDataset
class in Dataset Profile. - If package
intendedUse
or modelintendedUse
is different from dataintendedUse
, the dataintendedUse
is considered a data original purpose - How to make it more convenient for machine to understand/parse information inside
intendedUse
?
Note on "model"
SPDX AIPackage
's current description is "Metadata information that can be added to a package to describe an AI application or trained AI model."
So a "model" can be either:
- an
SoftwareArtifact
withprimaryPurpose=model
(a model file alone); or - an
AIPackage
(a model with an inference code or similar, as a package?)
For example:
- ggml-model-gpt-2-117M.bin is a model file
- The code at https://github.com/ggerganov/ggml/tree/master/examples/gpt-2 is an inference code for GPT-2 model
(1)+(2) together can be an AIPackage
(a "trained AI model" according to the AIPackage
description).
The proposal above may not work with an SoftwareArtifact
with primaryPurpose=model
, as SoftwareArtifact
has neither an intendedUse
nor a domain
property.
References
[1] Latest draft 2 Feb 2024 https://www.europarl.europa.eu/meetdocs/2014_2019/plmrep/COMMITTEES/CJ40/AG/2024/02-13/1296003EN.pdf