2026-05-29 PATTERN ~5 min

Model cards vs pre-registration: what counts as evidence under the EU AI Act.

The Act tells you to declare your accuracy numbers and keep records that let them be checked. It does not tell you which artifact satisfies that. A short, practical comparison of the two artifacts a reviewer is most likely to see, and the one property that separates them.

When the EU AI Act's high-risk obligations bind on 2 August 2026, a provider of a high-risk system has to do two unglamorous things about its accuracy numbers: declare them (Article 15) and keep records that let them be checked (Article 12, with the technical documentation in Annex IV). The Act tells you the obligation. It does not tell you which artifact satisfies it.

So teams reach for the artifact they already know: the model card, or an internal evaluation report. That instinct is right that documentation is required. It is wrong about what the documentation has to prove.

This is a short, practical comparison of the two artifacts a reviewer is most likely to see, and the one property that separates them.

This is an engineering pattern, not legal advice. Article and Annex references are to Regulation (EU) 2024/1689. Confirm scope and sufficiency with qualified counsel and your notified body.

What the Act actually asks for

Strip away the framing and Article 15 wants a stated level of accuracy that is appropriate and, crucially, verifiable. Article 12 wants logs that record the system's functioning across its lifetime. Annex IV §2 wants the evaluation methodology written down.

The recurring word, implicit in all three, is verifiable. Not "asserted." Not "documented in prose." Checkable by someone who was not in the room.

The default answer: model cards and eval reports

A model card is a good thing. It collects the metric, the dataset, the intended use, known limitations, and the headline numbers in one place. For transparency and for Annex IV §3 it does real work.

What a model card does not do is establish when its numbers were decided. The accuracy figure, the threshold, the evaluation slice, the random seed: in a model card these are reported after the experiment, in editable prose. Nothing in the artifact distinguishes a number that was committed before the run from one that was selected, after seeing several runs, because it looked best. A model card is a trust-me document. It can be edited after the fact and nothing about it changes.

That is not a knock on the people who write them. It is a structural property: a document written after the result cannot, by itself, prove what was promised before it.

The property a reviewer will eventually ask about

Here is the question that turns documentation into evidence, and it is the one an auditor or a careful customer eventually asks:

"When was this accuracy threshold set, relative to when you saw the result?"

If the honest answer is "we cannot show that," then the number is, in the Act's sense, hard to verify. It is an assertion with a citation, not evidence.

Closing that gap needs one property the model card does not have: pre-commitment that is tamper-evident. A way to show that the claim existed, in its exact form, before the experiment ran, and that it has not been quietly edited since.

Where pre-registration fits

Pre-registration is the discipline of writing down what you will measure, and the bar you will hold it to, before you measure it. It is standard in clinical trials and in parts of empirical science for exactly the reason above: it removes the room to reshape the claim after seeing the data.

A pre-registered ML manifest (PRML) applies this to an evaluation claim. You write the claim as eight fields (metric, comparator, threshold, dataset hash, seed, producer, and so on) and bind it to a SHA-256 digest computed over the canonical bytes before the run. A verifier with the manifest, the dataset, and the model recomputes the digest, executes the claim, and returns a deterministic verdict: PASS, FAIL, or TAMPERED. Editing the claim after the fact changes the bytes, which changes the hash, which the verifier flags. The spec is open (CC BY 4.0); reference implementations are MIT.

This is not a replacement for the model card. It is the evidence layer underneath it. The model card says "accuracy is 0.91." The manifest says "and here is cryptographic proof that 0.90 was the bar we committed to before we ran, and that nobody moved it." One is the human-readable summary; the other is the part a reviewer can independently re-derive.

The one-line comparison

QuestionModel card / eval reportPre-registered manifest
States the metric and number?YesYes
Human-readable summary of intended use, limits?YesNo (points to the card)
Proves the threshold was set before the run?NoYes
Detects a silent post-hoc edit?NoYes (hash mismatch)
Independently re-derivable by a third party?Not by itselfYes
Required reading for a regulator?Likely (Annex IV §3)Supports Art. 12 / 15 verifiability

What this does and does not buy you

A pre-registered manifest does not make a model accurate, safe, or compliant. It makes one specific claim, the accuracy figure you report, checkable rather than trusted. That is a narrow thing. It is also exactly the thing that is missing from the documentation most teams will bring to an Article 15 conversation.

If you keep model cards, keep them. Add the layer that answers the "when was the threshold set" question before someone asks it.

Cüneyt Öztürk — independent researcher. [email protected]. PRML (Pre-Registered ML Manifest) is an open specification with four byte-equivalent reference implementations and a published conformance suite. This post is CC BY 4.0; reuse, fork, cite without attribution if it helps.