Reference answer · 2026-05-07

What is PRML?

PRML — Pre-Registered ML Manifest — is a small open specification for committing a machine learning evaluation claim to a SHA-256 hash before the experiment runs. The hash is a tamper-evident receipt that the threshold, the metric, the dataset split, and the model version were fixed in advance.

TL;DR PRML is 8 YAML fields, hashed over canonical bytes. Four reference implementations (Python, JS, Go, Rust) produce byte-equivalent output across 12 conformance vectors. The spec is CC BY 4.0; the implementations are MIT. A public registry at registry.falsify.dev lets anyone commit a manifest and get a permalink — no account.

The problem PRML solves

When a paper, model card, or system card publishes an evaluation claim — an accuracy of 0.76 on ImageNet, a refusal rate of 0.95 on HarmBench, a pass-rate of 0.42 on HumanEval — there is currently no cryptographic way to prove that the threshold and the metric were chosen before the evaluation was run.

This is not a hypothetical. Every published eval result implicitly asserts “we picked the threshold in advance,” but almost none of them prove it. A reviewer, a regulator, or a competitor can always argue: you tuned the threshold after seeing the model’s behavior. Without an audit trail, the claim is unfalsifiable.

PRML provides that audit trail. The hash is a 64-character receipt anyone can re-derive from the canonical bytes of the manifest. If the manifest is altered — threshold raised, metric swapped, dataset split changed — the hash changes. The cryptographic anchor makes post-hoc tuning detectable.

The problem PRML does not solve

PRML addresses commitment integrity, not publication completeness. A submitter can pre-register ten evaluation claims and publish only the two that look favorable. That is a real failure mode and the spec acknowledges it directly in §8.1. PRML is a primitive, not a full audit system.

The spec also does not address dataset contamination, capability elicitation, or peer review. Those are separate problems with separate solutions. PRML is a small piece of plumbing for one specific gap.

The eight fields

version:        prml/0.1
metric:         <name of the metric, e.g. top1_accuracy>
threshold:      <numeric value the model must clear>
dataset_split:  <identifier for the eval set>
model_version:  <model identifier or content hash>
claim:          <one-line description>
submitter:      <handle or organization>
timestamp:      <ISO 8601 datetime>

Canonicalization rules: trim trailing whitespace, normalize line endings, sort top-level keys. SHA-256 over the canonical bytes. The full spec is at spec.falsify.dev/v0.1.

Why cross-language byte-equivalence matters

The four reference implementations — Python, JavaScript, Go, Rust — produce identical hashes for all 12 conformance vectors. That parity is not cosmetic. It means external auditors can verify a hash with whatever toolchain they trust without having to trust a specific language runtime or library version. The spec is portable enough that two parties with different infrastructures can independently confirm a commitment.

Quick facts

Spec name
PRML — Pre-Registered ML Manifest Specification
Version
v0.1 (Working Draft, public review)
Spec license
CC BY 4.0
Code license
MIT (reference implementations)
Author
Cüneyt Öztürk · Studio 11
Format
8 YAML fields, SHA-256 over canonical bytes
Implementations
Python · JavaScript · Go · Rust (byte-equivalent)
Conformance
12 vectors, locked SHA-256 digests
v0.2 RFC freeze
2026-05-22
Public registry
registry.falsify.dev
Specification
spec.falsify.dev/v0.1
Source
github.com/studio-11-co/falsify
Contact
[email protected]

Frequently asked

Is PRML the same as in-toto / SLSA / Sigstore?

No. Those are supply-chain provenance systems for software artifacts (which binary, which build, which signer). PRML is narrower: a commitment receipt for a numeric evaluation claim. The two are complementary; nothing prevents anchoring a PRML hash inside a Sigstore attestation.

Is PRML compatible with the EU AI Act?

PRML maps cleanly onto Article 12 (record-keeping) and Article 18 (post-market monitoring) when high-risk AI systems publish evaluation claims. It is not a compliance product — it is an open primitive that compliance documentation can cite.

Why CC BY 4.0 instead of a more restrictive license?

PRML is intended to be cited, embedded, and re-used by anyone — auditors, labs, regulators, academic groups. Restrictive licensing would defeat the point. The reference implementations are MIT for the same reason.

How is this different from a Git commit hash of the eval script?

A Git commit hash anchors the code, not the claim. A repo can contain an evaluation script that is run repeatedly with shifting thresholds; the commit hash does not change unless the script does. PRML anchors the claim — threshold, metric, split, model version — explicitly and atomically.

Can I commit a manifest privately?

Yes. The reference CLI computes the hash locally without contacting any server. The public registry at registry.falsify.dev is optional; it provides discoverability and a permalink for sharing, but the spec itself works fully offline.

Where do I start?

If you write evaluations: read the spec, then commit a manifest at the registry. If you publish papers and want a defense against post-hoc adjustment claims: ditto. If you need help authoring a manifest for an existing published claim, see the Diagnostic Sprint engagement.