Obligations for high-risk AI system providers

Criteria for high-risk AI (Article 6, Annex I, Annex III)
Quality management system (Article 17)
Risk management system (Article 9)
Quality and governance of datasets (Article 10)
Performance, robustness and AI security (Article 15)
Logging and automated monitoring (Article 12)
Human oversight (Article 14)
Post-market monitoring (Article 72)
Reporting of serious incidents (Article 73)
Transparency to deployers and instructions for use (Article 13)
Technical documentation for conformity (Article 11, Annex IV)
Conformity assessment procedure (Article 43, Annex VI, Annex VII)
Key points across articles

^ Criteria for high-risk AI

Coming soon!

^ Quality management system

Coming soon!

^ Risk management system

Role and scope of the risk management system: Article 8(1), 9(1), (3) and (9)

Coming soon!

Risk management process: Article 9(2) and (4)

Coming soon!

Risk acceptability: Article 9(5)

Coming soon!

Testing: Article 9(6), (7) and (8)

Coming soon!

Interplay with other EU legislation: Article 9(10)

Coming soon!

^ Quality and governance of datasets

Scope

Article 10’s requirements always apply to testing data sets, whatever the AI technology used.

If the development of the AI system involves training, the requirements also apply to any training or validation data set used.

Preparation and handling of datasets: Article 10(2)

Coming soon!

Dataset quality requirements: Article 10(3) and (4)

Coming soon!

Interplay with GDPR and related legislation: Article 10(5)

Coming soon!

^ Performance, robustness and AI security

Concept of performance and interplay with robustness and AI security

Article 15 “Accuracy, robustness and cybersecurity” uses non-trivial terminology. ‘Accuracy’ in the AI Act means what AI developers would refer to as ‘performance’. But ‘performance’ in the AI Act is broader, encompassing any aspect of the ability of an AI system to achieve its intended purpose (including robustness, cybersecurity and possibly other criteria depending on the type of intended purpose). ‘Robustness’ is the resilience against errors, faults and inconsistencies, or more generally unexpected situations, while ‘cybersecurity’ is the resilience against attempts by unauthorised third parties to alter the use, outputs or performance of the AI system by exploiting system vulnerabilities. This is not limited to common software vulnerabilities but extended in the AI Act to AI-specific vulnerabilities (e.g. adversarial attacks or data poisoning), even though these are not usually perceived as cybersecurity by AI developers but more often referred to as ‘AI security’.

These characteristics are fully complementary and build on each other:

Accuracy pertains to proper functioning overall within the intended purpose of the AI system.
Robustness captures how these accuracy levels are preserved when considering specific conditions, either particular cases within the indended purpose, or cases of reasonably foreseeable use. High robustness does not imply high accuracy, but consistent accuracy.
Cybersecurity further extends these properties by adding the dimension of malicious intent whereas accuracy and robustness focus on naturally occurring conditions. It is distinct from robustness, and what is sometimes referred to as ‘robustness to adversarial attacks’ is in fact cybersecurity and out of the concept of robustness in AI Act terms.

All three are required by the AI Act, both initially and throughout the use of the AI system (which typically implies reassessment) after being placed on the market or put into service.

Accuracy: Article 15(1) and (3)

High-risk AI systems are required to be accurate. This does not mean they are forbidden to make any mistake, but that their accuracy needs to be quantified and that this level of accuracy has to meet a threshold that is appropriate given the intended purpose, the state of the art and identified risks (see Recital 74).

In addition, the accuracy metrics used and associated results have to be documented in the instructions for use. This is meant to inform deployers so that they better understand the capabilities of the AI system, and therefore specific care is expected for making this information accessible to that audience, avoiding any misleading communication.

This obligation is mentioned again in the requirements on the instructions for use, in Article 13(3)(b)(ii), but it is also complemented with an obligation to disclose known and foreseeable circumstances that are associated with lower accuracy. Article 13(3)(b)(v) additionally requires the specific disclosure of accuracy for certain groups of persons if applicable. Both imply that the accuracy does not need to be solely measured, but also detailed and analysed at a more granular level (per group or circumstance).

The technical documentation (meant for demonstrating compliance) also expects information about accuracy metrics and resulting accuracy levels, both overall and per group, but it includes in addition a justification of the choice of the metrics, and a description of the validation and testing procedures used.

As accuracy is required to be preserved throughout the operation of the AI system, demonstrating compliance to that requirement can imply to reassess it in a periodic or more targeted manner, either by the provider itself or delegated to the deployer, but this can trigger various questions on which test data to use for that reassessment, how to organise data preparation and what threshold to target in operation. Accuracy assessed in operation is typically information that can be used for the monitoring of operation by the deployer (for instance, degraded accuracy can trigger a reconsideration of the suitability of the AI system) or in the frame of post-market monitoring (where low accuracy in operation can inform the provider on actual compliance of its AI system if initially assessed on unsuitable data, or on degradation effects throughout the lifetime of the AI system).

Robustness: Article 15(1) and (4)

Similarly to accuracy, high-risk AI systems are required to be robust, meaning to achieve a robustness level that is appropriate given the intended purpose, state of the art and identified risks. Depending on the cases, and in particular the risks, it can be a targeted property (checking robustness towards a certain set of pre-defined conditions) or it can be open-ended (ability to offer guarantees so that the accuracy level is preserved within a certain margin, regardless of the conditions). Meeting that requirement implies an assessment of robustness, but it does not need to involve a comparative assessment of accuracy in alternate conditions and there can be other ways to examine the behaviour of the AI system in response to certain conditions. It can imply an identification of specific targeted conditions, which can pertain either to the AI system or its environment (including interactions, with humans or with systems).

The level of robustness also has to be documented in the instructions for use (Article 13), together with information on known and foreseeable circumstances that can lead to low robustness. This is different from and complementary to the identification of circumstances leading to low accuracy: this corresponds to conditions in which accuracy is more unstable across inputs, not necessarily low. Technical documentation (Annex IV) expects information on metrics used to measure robustness, as well as the corresponding testing procedure, but information on the results is more generally captured through test logs and test reports rather than specific required information on robustness.

Article 15(4) further requires to implement measures to ensure robustness, which can be either technical or organisational. Technical measures can be implemented in the AI system itself or be based on redundancy, and they can imply for instance an automatic shutdown of the AI system when it detects certain anomalies or operation beyond certain predefined boundaries. Organisational measures to ensure robustness can interact closely with the human oversight measures adopted separately.

Feedback loops: Article 15(4)

Article 15(4) also contains an additional requirement, which does not directly pertain to the concept of robustness but relates to it. It requires to minimise, if possible eliminate, the presence of feedback loops in the design of the AI system, and to mitigate the effect of any remaining feedback loop.

This is required specifically in the case of AI systems that continue to learn after being placed on the market or put into service, which is broader than continuous learning only (where that learning happens in an ongoing basis) and encompasses any case where the learned model changes, possibly with human involvement. In these cases, a feedback loop is the presence of any mechanism whereby the AI system’s outputs can influence the future state of the AI system, either directly (e.g. reuse as new training data) or indirectly (by causing decisions and actions that affect real-world data which is in turn sampled for new training data). Such feedback loops are especially a concern in case of biased outputs, as the feedback loop can falsely confirm or even amplify the biased behaviour.

Cybersecurity: Article 15(1) and (5)

Coming soon!

Support from metrology: Article 15(2)

Coming soon!

^ Logging and automated monitoring

Role and scope of logging capabilities: Article 12(1)

Coming soon!

Events to log: Article 12(2)

Coming soon!

Additional Logging for Remote biometric identification: Article 12(3)

In the specific case of remote biometric identification systems, Article 12(3) adds four requirements:

The period of use of the system needs to be recorded each time the system is used, by logging the date and time of the beginning and end of that period.
Information on the reference database used for comparison needs to be retained. This can be recorded only once if the database never changes. But if the database changes through time, there needs to be a way to retrieve the specific version used when processing each input. If the reference database depends on the input, it needs to be recorded for each input received by the system.
All inputs for which a match has been found by the system have to be recorded.
Sufficient information needs to be retained to retrieve the two persons who have verified the results before taking any action or decision on the basis of that identification (except for certain uses in law enforcement, migration, border control or asylum where this verification is not required).

Obligations for high-risk AI system providers

^ Criteria for high-risk AI

Coming soon!

^ Quality management system

Coming soon!

^ Risk management system

Role and scope of the risk management system: Article 8(1), 9(1), (3) and (9)

Risk management process: Article 9(2) and (4)

Risk acceptability: Article 9(5)

Testing: Article 9(6), (7) and (8)

Interplay with other EU legislation: Article 9(10)

^ Quality and governance of datasets

Scope

Preparation and handling of datasets: Article 10(2)

Dataset quality requirements: Article 10(3) and (4)

Interplay with GDPR and related legislation: Article 10(5)

^ Performance, robustness and AI security

Concept of performance and interplay with robustness and AI security

Accuracy: Article 15(1) and (3)

Robustness: Article 15(1) and (4)

Feedback loops: Article 15(4)

Cybersecurity: Article 15(1) and (5)

Support from metrology: Article 15(2)

^ Logging and automated monitoring

Role and scope of logging capabilities: Article 12(1)

Events to log: Article 12(2)

Additional Logging for Remote biometric identification: Article 12(3)

^ Human oversight

Purpose and frame of human oversight obligations: Article 14(1) and (2)

Roles of the provider and the deployer: Article 14(3)

Expectations from human oversight: Article 14(4)

Additional oversight for remote biometric identification: Article 14(5)

^ Post-market monitoring

Coming soon!

^ Reporting of serious incidents

coming soon!

^ Transparency to deployers and instructions for use

coming soon!

^ Technical documentation for conformity

Coming soon!

^ Conformity assessment procedure

Coming soon!

Key points across articles

Events as a cornerstone

Monitoring at various scales

substantial modifications

Explainability and interpretability

Bias in datasets and in systems

Domain as an implicit foundation