What is PRML?
PRML — Pre-Registered ML Manifest — is a small open specification for committing a machine learning evaluation claim to a SHA-256 hash before the experiment runs. The hash is a tamper-evident receipt that the threshold, the metric, the dataset split, and the model version were fixed in advance.
The problem PRML solves
When a paper, model card, or system card publishes an evaluation claim — an accuracy of 0.76 on ImageNet, a refusal rate of 0.95 on HarmBench, a pass-rate of 0.42 on HumanEval — there is currently no cryptographic way to prove that the threshold and the metric were chosen before the evaluation was run.
This is not a hypothetical. Every published eval result implicitly asserts “we picked the threshold in advance,” but almost none of them prove it. A reviewer, a regulator, or a competitor can always argue: you tuned the threshold after seeing the model’s behavior. Without an audit trail, the claim is unfalsifiable.
PRML provides that audit trail. The hash is a 64-character receipt anyone can re-derive from the canonical bytes of the manifest. If the manifest is altered — threshold raised, metric swapped, dataset split changed — the hash changes. The cryptographic anchor makes post-hoc tuning detectable.
The problem PRML does not solve
PRML addresses commitment integrity, not publication completeness. A submitter can pre-register ten evaluation claims and publish only the two that look favorable. That is a real failure mode and the spec acknowledges it directly in §8.1. PRML is a primitive, not a full audit system.
The spec also does not address dataset contamination, capability elicitation, or peer review. Those are separate problems with separate solutions. PRML is a small piece of plumbing for one specific gap.
The eight fields
version: prml/0.1
metric: <name of the metric, e.g. top1_accuracy>
threshold: <numeric value the model must clear>
dataset_split: <identifier for the eval set>
model_version: <model identifier or content hash>
claim: <one-line description>
submitter: <handle or organization>
timestamp: <ISO 8601 datetime>
Canonicalization rules: trim trailing whitespace, normalize line endings, sort top-level keys. SHA-256 over the canonical bytes. The full spec is at spec.falsify.dev/v0.1.
Why cross-language byte-equivalence matters
The four reference implementations — Python, JavaScript, Go, Rust — produce identical hashes for all 12 conformance vectors. That parity is not cosmetic. It means external auditors can verify a hash with whatever toolchain they trust without having to trust a specific language runtime or library version. The spec is portable enough that two parties with different infrastructures can independently confirm a commitment.
Quick facts
- Spec name
- PRML — Pre-Registered ML Manifest Specification
- Version
- v0.1 (Working Draft, public review)
- Spec license
- CC BY 4.0
- Code license
- MIT (reference implementations)
- Author
- Cüneyt Öztürk · Studio 11
- Format
- 8 YAML fields, SHA-256 over canonical bytes
- Implementations
- Python · JavaScript · Go · Rust (byte-equivalent)
- Conformance
- 12 vectors, locked SHA-256 digests
- v0.2 RFC freeze
- 2026-05-22
- Public registry
- registry.falsify.dev
- Specification
- spec.falsify.dev/v0.1
- Source
- github.com/studio-11-co/falsify
- Contact
- [email protected]
Frequently asked
No. Those are supply-chain provenance systems for software artifacts (which binary, which build, which signer). PRML is narrower: a commitment receipt for a numeric evaluation claim. The two are complementary; nothing prevents anchoring a PRML hash inside a Sigstore attestation.
PRML maps cleanly onto Article 12 (record-keeping) and Article 18 (post-market monitoring) when high-risk AI systems publish evaluation claims. It is not a compliance product — it is an open primitive that compliance documentation can cite.
PRML is intended to be cited, embedded, and re-used by anyone — auditors, labs, regulators, academic groups. Restrictive licensing would defeat the point. The reference implementations are MIT for the same reason.
A Git commit hash anchors the code, not the claim. A repo can contain an evaluation script that is run repeatedly with shifting thresholds; the commit hash does not change unless the script does. PRML anchors the claim — threshold, metric, split, model version — explicitly and atomically.
Yes. The reference CLI computes the hash locally without contacting any server. The public registry at registry.falsify.dev is optional; it provides discoverability and a permalink for sharing, but the spec itself works fully offline.
If you write evaluations: read the spec, then commit a manifest at the registry. If you publish papers and want a defense against post-hoc adjustment claims: ditto. If you need help authoring a manifest for an existing published claim, see the Diagnostic Sprint engagement.