Article 12 Evidence Pack (sample)
A tamper-evident, cryptographically self-verifying technical artifact for EU AI Act Article 12 record-keeping and Annex IV section 2(d) documentation. This is a sample of the deliverable a Sprint Audit Review produces. It is suitable for forwarding to internal compliance leads, notified-body assessors, and accredited audit firms.
1. What this document is
A complete Article 12 record for one evaluation claim contains, at minimum, four things: a pre-registered manifest that fixes the claim, a cryptographic commit of that manifest before the run, the result of the run, and a procedure for any third party to verify that the manifest and the commit match. This document collects all four in a single artifact, plus the mappings to the regulatory and standards text that an assessor will look for.
It is structured as a self-contained PDF: the assessor does not need to navigate to any other URL to reproduce the cryptographic check, although the references in section 4 and section 5 let them cross-check against the regulation text and the ISO control text directly.
2. The pre-registered manifest and its commit
The manifest below is the eight-field YAML document fixed before the evaluation ran, in PRML v0.1 schema. The SHA-256 hash is computed over the canonicalised bytes of this document (see spec section on canonicalisation for the canonical form rules). The hash was published to the registry at the timestamp shown, before any result was recorded.
version: prml/0.1
claim_id: 01900000-0000-7000-8000-000000000001
created_at: "2026-05-08T20:00:00Z"
metric: accuracy
comparator: ">="
threshold: 0.92
dataset:
id: imagenet-val-2012
hash: PLACEHOLDER-DATASET-SHA256-1f2e3d4c5b6a7980abcdef0123456789fedcba9876543210
uri: https://image-net.org/data/ILSVRC2012_img_val.tar
seed: 42
producer:
id: sample-provider.example
model:
id: resnet50-fp16
hash: PLACEHOLDER-MODEL-SHA256-9a8b7c6d5e4f30210fedcba9876543210abcdef0123456789
notes: |
ImageNet-1k val (2012). Top-1 accuracy at fp16 inference,
10-crop disabled, deterministic dataloader, batch size 64.
Eval harness pinned to torchvision 0.18.0, pytorch 2.3.1,
CUDA 12.4. Hardware: single A100 80GB.
Manifest SHA-256 (canonical bytes):
PLACEHOLDER-a3f4b9c2e7d1f8e4a9c0b3d6f5e2a1c8b4d7e0a3c6b9f2e5d8a1b4c7d0e3f6a9
Commit publication record:
| Channel | Reference | Timestamp |
|---|---|---|
| Public registry | registry.falsify.dev/m/<hash> | 2026-05-08 20:00:14 UTC |
| Git commit witness | github.com/<provider>/<repo>@<sha> | 2026-05-08 20:00:31 UTC |
| Third-party archive | Zenodo DOI 10.5281/zenodo.<id> | 2026-05-09 09:14 UTC |
Three independent witnesses are not required by the spec; one durable, publicly resolvable commit is sufficient. Multiple witnesses raise the cost of a coordinated retraction attack and are recommended for high-risk systems under Article 6 Annex III.
3. Verification: how an auditor re-derives the hash
The verification step is mechanical. An auditor with access to the manifest text (this document, section 2) and any one of the four byte-equivalent reference implementations can re-derive the SHA-256 in under one minute, without contacting the provider.
Reference implementations, all CC BY 4.0 spec, MIT code:
- Python:
pip install falsify·falsify verify manifest.yaml - JavaScript:
npx @falsify/cli verify manifest.yaml - Go:
falsify-go verify manifest.yaml - Rust:
falsify-rs verify manifest.yaml
Sample verification output (any implementation):
$ falsify verify manifest.yaml --commit PLACEHOLDER-a3f4b9c2...
manifest : manifest.yaml
canonical bytes : 612 bytes
derived sha256 : PLACEHOLDER-a3f4b9c2e7d1f8e4a9c0b3d6f5e2a1c8b4d7e0a3c6b9f2e5d8a1b4c7d0e3f6a9
registry commit : PLACEHOLDER-a3f4b9c2e7d1f8e4a9c0b3d6f5e2a1c8b4d7e0a3c6b9f2e5d8a1b4c7d0e3f6a9
match : PASS
registry record : registry.falsify.dev/m/PLACEHOLDER-a3f4b9c2...
committed at : 2026-05-08T20:00:14Z
resolved at : 2026-06-08T20:00:00Z (T+30 days)
result : 0.918 (claimed: >= 0.92) — FAIL
failure honoured : yes (no edit to manifest after result)
The four implementations produce byte-equivalent output across 20 conformance vectors. If two implementations disagree on the canonical form of any manifest, that is a spec bug and the spec is the side that has to be fixed, not the implementation. The conformance suite is open at github.com/studio-11-co/falsify/tree/main/conformance.
4. Mapping to EU AI Act Article 12 and Annex IV section 2
The table below states what this evidence pack covers and what it does not, against the operative text of Article 12 of Regulation (EU) 2024/1689 and Annex IV section 2.
| Reference | Requirement (summary) | Coverage | Notes |
|---|---|---|---|
| Article 12(1) | Automatic recording of events over system lifetime | PARTIAL | PRML records each evaluation event as an immutable commit. It does not record runtime inference events; those need a separate Article 12(1) log pipeline. |
| Article 12(2)(a) | Identification of situations causing risk | NONE | Out of PRML scope. Covered by the provider's risk management system under Article 9. |
| Article 12(2)(b) | Facilitation of post-market monitoring (Article 72) | PARTIAL | PRML manifests are queryable by claim_id and producer.id, supporting longitudinal performance monitoring. |
| Article 12(2)(c) | Monitoring of operation, especially Article 26(5) | PARTIAL | Evaluation-time monitoring. Runtime monitoring is out of scope. |
| Annex IV §2(d) | Evaluation methods, performance metrics, validation procedures, accuracy/robustness metrics with statistical significance | FULL | Direct fit. The manifest names the metric, comparator, threshold, dataset, seed, and model identifier; the commit makes retroactive modification mechanically detectable. |
| Annex IV §2(b) | Design specifications, key design choices, methodologies | PARTIAL | Notes field captures the eval methodology declaration. Full design rationale belongs elsewhere in the technical documentation file. |
| Annex IV §2(h) | Cybersecurity measures | NONE | Composes with Sigstore / in-toto / SLSA for code-supply-chain integrity. Out of PRML scope. |
| Article 15(1) | Accuracy, robustness, cybersecurity throughout lifecycle | PARTIAL | PRML provides the accuracy-claim attestation layer. Robustness testing and cybersecurity belong to other primitives. |
| Article 15(3) | Levels of accuracy declared in instructions for use | PARTIAL | Declared accuracy is mechanically traceable to a hashed manifest, so the declaration can be checked against the eval that produced it. |
| Article 18(1) | Retention of automatically generated logs (10 years) | FULL | Pro and Enterprise registry tiers provide a ten-year retention SLA. Manifests in the Developer tier rely on the provider to maintain durable storage. |
The "FULL / PARTIAL / NONE" column is deliberately strict: it reports what the evidence pack mechanically provides, not what the provider's broader compliance program covers. Most rows that read PARTIAL or NONE here will read FULL elsewhere in the provider's technical documentation file under Annex IV.
5. Mapping to ISO/IEC 42001 controls
For providers under, or pursuing, an ISO/IEC 42001:2023 (AI Management System) management system, the mapping below shows where this evidence pack contributes. References are to the published 42001 control text.
| Control | Title | Coverage | Notes |
|---|---|---|---|
| A.6.2.4 | Documented information for AI system | PARTIAL | Manifest is one piece of documented information per evaluation claim. |
| A.6.2.6 | System impact assessment | NONE | Out of scope. |
| A.7.4 | Data quality for AI systems | PARTIAL | Dataset hash and URI in manifest support dataset-identity attestation. |
| A.8 | Information for interested parties (record-keeping family) | PARTIAL / FULL on eval records | The cryptographic-commit pattern is a strong fit for record-keeping that must survive disputes about retroactive edit. |
| A.8.2 | System log information | PARTIAL | Eval logs covered; runtime inference logs out of scope. |
| A.9.3 | Performance evaluation of AI system | FULL on the bound claim | Direct fit. The manifest pre-registers the performance evaluation; the result is bound to the hash before the run. |
This sample mapping uses control numbering from the 42001:2023 published text. For a provider running parallel certifications under 42001 and the AI Act, the column "Annex IV §2(d) coverage" in section 4 above and the column "A.9.3 coverage" here will overlap heavily; both describe the same evaluation event from different normative perspectives.
6. Limitations and out-of-scope items
This section is taken from spec section 8.1, restated here for the assessor's convenience. The spec is the controlling document.
- Selective non-publication. A provider could commit one hundred manifests, run all one hundred evaluations, and only publish the three that came out well. PRML by itself does not catch this; a separate completeness mechanism is required (registry-side claim-set commitments are on the v0.3 backlog).
- Execution-time data tampering inside the eval harness. If the eval harness silently truncates the dataset before computing the metric, the manifest hash will still verify; the dishonesty happens between data load and metric computation. Detecting this requires independent re-execution by a reviewer with their own dataset copy.
- Model binary integrity. The manifest references the model by id and hash but does not attest that the binary on disk matches the hash. This belongs to Sigstore / in-toto / SLSA in the supply chain layer.
- Runtime gating. PRML does not block model loading or inference if a manifest fails to verify. It is an audit primitive, not a runtime gate.
- Legal-compliance certification. This document does not certify legal compliance with Article 12, 15, or 18. It is one input to the compliance file. Certification under the AI Act is the responsibility of the notified body and, for self-assessment routes, the provider's quality-management system under Article 17.
7. For auditors and notified-body assessors
Suggested workflow for an assessor receiving this document as part of an Annex IV technical documentation file:
- Extract the manifest from section 2 and save it to a local file.
- Install any one of the four reference implementations (one-line install). Validate the manifest against the PRML JSON Schema (canonical schema is in SchemaStore, indexed since 2026-05-11).
- Re-canonicalise and re-hash. Compare against the SHA-256 string in section 2 and against the registry record.
- If you have independent access to the dataset and model artefacts, hash them and compare against the
dataset.hashandmodel.hashfields. This catches identity-substitution attacks but not within-run tampering. - Cross-check the registry timestamp against the provider's claim of "pre-registered before the run." Any future-dated commits relative to result reporting are an audit red flag.
- For high-risk systems, request a second independent reviewer to re-execute the eval against the provider's dataset and model copies. The manifest gives that reviewer everything they need to reproduce the run byte-for-byte modulo hardware non-determinism (which the seed and notes fields partly mitigate).
An assessor who completes steps 1-3 has a tamper-evident attestation of the claim that does not depend on trusting the provider. Steps 4-6 raise the assurance level further.