Reference answer · 2026-05-07

What is PRML?

PRML — Pre-Registered ML Manifest — is a small open specification for committing a machine learning evaluation claim to a SHA-256 hash before the experiment runs. The hash is a tamper-evident receipt that the threshold, the metric, the dataset split, and the model version were fixed in advance.

TL;DR PRML is 9 YAML fields, hashed over canonical bytes. Four reference implementations (Python, JS, Go, Rust) produce byte-equivalent output across 21 conformance vectors (13 v0.1 + 8 v0.2 RFC). The spec is CC BY 4.0; the implementations are MIT. A public registry at registry.falsify.dev lets anyone commit a manifest and get a permalink — no account.

The problem PRML solves

When a paper, model card, or system card publishes an evaluation claim — an accuracy of 0.76 on ImageNet, a refusal rate of 0.95 on HarmBench, a pass-rate of 0.42 on HumanEval — there is currently no cryptographic way to prove that the threshold and the metric were chosen before the evaluation was run.

This is not a hypothetical. Every published eval result implicitly asserts “we picked the threshold in advance,” but almost none of them prove it. A reviewer, a regulator, or a competitor can always argue: you tuned the threshold after seeing the model’s behavior. Without an audit trail, the claim is unfalsifiable.

PRML provides that audit trail. The hash is a 64-character receipt anyone can re-derive from the canonical bytes of the manifest. If the manifest is altered — threshold raised, metric swapped, dataset split changed — the hash changes. The cryptographic anchor makes post-hoc tuning detectable.

The problem PRML does not solve

PRML addresses commitment integrity, not publication completeness. A submitter can pre-register ten evaluation claims and publish only the two that look favorable. That is a real failure mode and the spec acknowledges it directly in §8.1. PRML is a primitive, not a full audit system.

The spec also does not address dataset contamination, capability elicitation, or peer review. Those are separate problems with separate solutions. PRML is a small piece of plumbing for one specific gap.

The nine fields

version:     prml/0.1
claim_id:    <UUIDv7 for this claim>
created_at:  <ISO 8601 datetime>
metric:      accuracy
comparator:  ">="
threshold:   0.90
dataset:
  id:        imagenet-val-2012
  hash:      <64-char SHA-256 of the eval set>
seed:        42
producer:
  id:        your-org-or-domain

Canonicalization rules: trim trailing whitespace, normalize line endings, sort top-level keys. SHA-256 over the canonical bytes. The full spec is at spec.falsify.dev/v0.1.

Why cross-language byte-equivalence matters

The four reference implementations — Python, JavaScript, Go, Rust — produce identical hashes for all 21 conformance vectors (13 v0.1 stable + 8 v0.2 RFC). That parity is not cosmetic. It means external auditors can verify a hash with whatever toolchain they trust without having to trust a specific language runtime or library version. The spec is portable enough that two parties with different infrastructures can independently confirm a commitment.

Quick facts

Spec name
PRML — Pre-Registered ML Manifest Specification
Version
v0.1 (Working Draft, public review)
Spec license
CC BY 4.0
Code license
MIT (reference implementations)
Author
Cüneyt Öztürk
Format
9 YAML fields, SHA-256 over canonical bytes
Implementations
Python · JavaScript · Go · Rust (byte-equivalent)
Conformance
21 vectors (13 v0.1 stable + 8 v0.2 RFC), locked SHA-256 digests
Editor tooling
VS Code, JetBrains, Helix, Zed, Cursor — autocomplete and validation via the SchemaStore catalog
SchemaStore
Merged 2026-05-11 by Mads Kristensen (Microsoft) · ~5M weekly catalog downloads
v0.2 RFC freeze
2026-05-22
Public registry
registry.falsify.dev
Specification
spec.falsify.dev/v0.1
Source
github.com/studio-11-co/falsify
Contact
[email protected]

Frequently asked

Is PRML the same as in-toto / SLSA / Sigstore?

No. Those are supply-chain provenance systems for software artifacts (which binary, which build, which signer). PRML is narrower: a commitment receipt for a numeric evaluation claim. The two are complementary; nothing prevents anchoring a PRML hash inside a Sigstore attestation.

Is PRML compatible with the EU AI Act?

PRML maps cleanly onto Article 12 (record-keeping) and Article 18 (post-market monitoring) when high-risk AI systems publish evaluation claims. It is not a compliance product — it is an open primitive that compliance documentation can cite.

Why CC BY 4.0 instead of a more restrictive license?

PRML is intended to be cited, embedded, and re-used by anyone — auditors, labs, regulators, academic groups. Restrictive licensing would defeat the point. The reference implementations are MIT for the same reason.

How is this different from a Git commit hash of the eval script?

A Git commit hash anchors the code, not the claim. A repo can contain an evaluation script that is run repeatedly with shifting thresholds; the commit hash does not change unless the script does. PRML anchors the claim — threshold, metric, split, model version — explicitly and atomically.

Can I commit a manifest privately?

Yes. The reference CLI computes the hash locally without contacting any server. The public registry at registry.falsify.dev is optional; it provides discoverability and a permalink for sharing, but the spec itself works fully offline.

Where do I start?

If you write evaluations: read the spec, then commit a manifest at the registry. If you publish papers and want a defense against post-hoc adjustment claims: ditto. If you need help authoring a manifest for an existing published claim, see the Diagnostic Sprint engagement.