Article 12 checklist: ten questions to close the automated logging gap before 2 August 2026
Article 12 of Regulation (EU) 2024/1689 is two sentences and a sub-list. The implementation is a system-design problem that most high-risk providers underestimate on first reading. This page is the working ten-item checklist we use in audits, with the six event categories that must be logged, the ten-year retention floor under Article 18, and a printable single-page version for compliance reviews.
What Article 12 actually says
The operative text of Article 12, in full:
"High-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system. Such logging capabilities shall conform to recognised standards or common specifications in the state of the art. The logging capabilities shall ensure a level of traceability of the AI system's functioning that is appropriate to the intended purpose of the system."
Three commitments compressed into one paragraph: automatic recording (no human in the loop for the log itself), over the lifetime of the system (not a development-time facility), and traceability appropriate to intended purpose (a scope test, not a uniform standard).
Article 12(2) then specifies the minimum: logs shall enable identification of situations that may result in the system presenting a risk under Article 79(1) or in a substantial modification. Article 12(3) requires logging of six specific event categories for remote biometric identification systems under Annex III(1)(a), but the principle generalises to any high-risk system.
The six log event categories
For Annex III(1)(a) biometric identification the regulation enumerates six categories. For other high-risk systems the categories are not enumerated but are derived from Article 12(2)'s traceability test. The six below are the categories notified bodies consistently expect across system classes:
| Category | What to log |
|---|---|
| Use periods | Start time, end time, and duration of each session in which the system was operated. The minimum traceable unit of system activity. |
| Reference database | For systems matching against a reference set: the identifier of the reference database used at the time of each use, with version or content hash. Changes to the reference database between uses must be reconstructible. |
| Input data | The input data on which the search or inference resulted in a match or output. For biometric systems this is explicit; for other systems this maps to the inference inputs sufficient to reproduce the output, subject to data-minimisation principles. |
| Verification persons | Identification of the natural persons involved in verifying the results, where Article 14 human oversight is implemented through a verification step. |
| System decisions | The output of the system per session, including confidence scores or other uncertainty quantifications surfaced to the deployer. |
| Risk events | Any incident, near-miss, or deviation from expected behaviour that may trigger Article 73 reporting or Article 72 post-market monitoring. This is the category most providers under-instrument. |
The ten-item Article 12 checklist
Each item below is one closeable question. If you can answer "yes, documented, tested" to all ten, your Article 12 posture is defensible at a notified body assessment. If any answer is "we are working on it," that becomes the work order between now and 2 August 2026.
Retention: the ten-year floor under Article 18
Article 18 binds providers to keep the automatically generated logs, the technical documentation, the EU declaration of conformity, and any decisions of notified bodies for ten years after the system is placed on the market. For systems with a long operational life (medical-device-adjacent AI, infrastructure control systems, biometric identification deployed by public authorities), this is a multi-decade obligation when production lifetime is added.
The retention infrastructure has three operational requirements that are easy to miss:
- Storage durability across staff turnover. The person who set up the log storage in 2026 will not be on the team in 2036. Documentation of the storage architecture is part of the technical documentation.
- Format survivability. A log written in a proprietary binary format will become unreadable when the tooling that produced it is deprecated. Open, documented formats (JSON Lines, Parquet, Arrow) are the safer choice. Cleartext is acceptable; obscure binary is risky.
- Storage-cost stability. Object storage at scale over ten years has a non-trivial cost trajectory. Cold storage tiers and lifecycle policies are the standard answer, but they must be tested for recovery before they are relied on for evidentiary recovery.
Where tamper-evident logging matters
The regulation does not explicitly mandate cryptographic guarantees on logs. It mandates a level of traceability "appropriate to the intended purpose." That phrase does most of the work in practice.
For a system whose intended purpose is high-stakes — credit scoring, recruitment screening, biometric identification, public-sector eligibility — the level of traceability that an audit will treat as appropriate is one where the log cannot have been edited after the event. For low-stakes systems an append-only file is sufficient. The boundary is fuzzy and notified bodies will draw it case by case.
The cheapest defensible posture is to make logs tamper-evident at the design level and stop arguing about whether it was required:
- Append-only storage with documented controls preventing modification of existing entries.
- Periodic Merkle anchoring of log batches to an external timestamping service (RFC 3161 TSA, OpenTimestamps, or a comparable public timestamping authority).
- Signed evaluation manifests at the per-run level, where each evaluation claim is committed to a SHA-256 hash before the run produces an output. This is the PRML pattern, an open spec we maintain. See the v0.1 specification.
The PRML hash anchor pattern, in three lines
For the evaluation-claim subset of Article 12 logging — the part that binds metrics, thresholds, datasets, and seeds — the load-bearing artefact is a manifest hash committed before the run starts. The pattern is small enough to fit in a yaml file and one line of tooling:
version: prml/0.1 metric: accuracy comparator: '>=' threshold: 0.85 dataset: id: imagenet-val-2012 hash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 seed: 42 producer: id: studio-11.co
The hash of the canonical bytes of this manifest becomes the run's identity. It is computed before the experiment runs. If a deployer later sees an accuracy claim and wants to verify it was not retroactively softened, they recompute the hash and compare. The threshold becomes a structural commitment rather than a footnote.
This is one specific pattern for one specific subset of Article 12 logging. It does not cover use-period logging, verification-person logging, or risk-event logging. It does cover the most contested part of an Article 12 audit: "did your team change the accuracy threshold between the report and the audit?"
What to do next
FAQ
Does Article 12 apply to GPAI models?
Not directly. Article 12 is in Chapter III Section 2, which binds high-risk systems. GPAI obligations live in Chapter V (Articles 50 to 56). Article 53(1)(d) imposes a documentation obligation on GPAI providers that resembles Article 12 in spirit but is structured differently. If your GPAI model is deployed in a high-risk system, the deploying provider inherits Article 12 obligations.
Is JSON Lines acceptable as a log format?
Yes. The regulation does not specify a format. JSON Lines is widely documented, parseable without proprietary tooling, and supports the schema-versioning requirement (item 04 on the checklist). Parquet and Arrow are also acceptable. Proprietary binary formats are not recommended for the ten-year retention requirement.
Do we need a separate log retention system, or can we use our existing observability stack?
The Article 12 logs are a defined regulatory artefact and benefit from clean separation from operational observability. Many teams take their existing observability stack as the source-of-truth, with a periodic export to a regulatory log store that satisfies the retention and integrity requirements. The export approach is simpler than rebuilding logging on top of a new system.
What is "tamper-evident" in regulatory practice?
Tamper-evident means modifications after the fact are detectable, not that they are impossible. Append-only file systems, cryptographic signatures over batches, and external timestamping are the standard mechanisms. Tamper-proof (impossibility of modification) is a stronger property and is not required by Article 12.
How does Article 12 interact with the GDPR data-minimisation principle?
The regulation requires logging of input data that triggered system decisions. GDPR requires data minimisation. The standard reconciliation is to log identifiers and content hashes rather than the raw input data, with the raw data accessible from a separate store under access controls. Recital 27 of the AI Act specifically references this interaction.