How is PRML different from in-toto?

Different layer. in-toto anchors software supply-chain steps (which builder ran which step in which order). PRML anchors a specific ML evaluation claim — threshold, metric, dataset split, model version — committed before the run. The two are complementary; a PRML hash can be embedded as a step input or output in an in-toto layout.

How is PRML different from SLSA?

SLSA is a framework of provenance levels for software artifacts (which binary, which build, which signer). PRML is a primitive for committing an evaluation claim (which threshold, which metric, which dataset split) before a model is run against it. SLSA answers 'is this binary trustworthy?'; PRML answers 'was this evaluation claim fixed in advance?'

How is PRML different from Sigstore?

Sigstore signs and time-stamps software artifacts using OIDC identities and a transparency log. PRML commits a small evaluation manifest to a SHA-256 hash. Nothing prevents wrapping a PRML hash inside a Sigstore attestation — that combines content-addressed evaluation commitment with identity-attested signing.

How is PRML different from Model Cards?

Model Cards document a model's properties, intended use, and known limitations. PRML commits a specific evaluation claim about a model. Model Cards describe; PRML proves. Hugging Face Model Card spec already includes free-form 'evaluation' sections — a PRML hash can be cited inside one.

How is PRML different from HELM?

HELM is a benchmark suite — a curated set of evaluations and metrics maintained by Stanford CRFM. PRML is upstream of any benchmark: it commits which threshold and metric a user is binding themselves to before running. A HELM run could optionally include a PRML hash in its run-metadata; the two solve different parts of eval rigor.

How is PRML different from OSF / ClinicalTrials.gov pre-registration?

Closest analogue. ClinicalTrials.gov works because trial registration is regulated and timestamps are state-anchored. OSF solves the same problem voluntarily for academic studies. PRML is the ML analogue, designed to work without a central registry mandate — the SHA-256 hash itself is the timestamp anchor (committed to a public registry like registry.falsify.dev or any public git/blockchain), not a centralised database.

How is PRML different from reproducibility checklists?

Reproducibility checklists (NeurIPS, MLRC) are protocols humans answer with yes/no/N-A to declare what they did. PRML is a cryptographic record that one specific commitment was made before the run. Checklists describe; PRML proves. Both are valuable; neither replaces the other.

Comparison · 2026-05-07 · spec author write-up

PRML vs adjacent primitives.

PRML is a small open spec. It is not the only thing that touches commitment, provenance, evaluation, or pre-registration. This page is a candid note from the spec's author about where PRML overlaps with existing work, where it differs, and where the right answer is to use them together.

One-line PRML commits an ML evaluation claim (threshold, metric, dataset split) to a SHA-256 hash before the run. Other primitives anchor software artifacts (Sigstore, in-toto, SLSA), describe model properties (Model Cards), curate benchmarks (HELM), or register study designs (OSF). They are largely complementary, not substitutes.

Primitive	What it anchors	Relationship
PRML	An ML evaluation claim (threshold + metric + dataset split + model version) committed before the run	—
in-toto	Software supply-chain steps (who built what, in what order)	Complementary · embed PRML hash as step input/output
SLSA	Software artifact provenance levels (build trustworthiness)	Complementary · SLSA L3 build can produce the model whose eval PRML commits
Sigstore	Software artifacts signed under OIDC identity, time-stamped in transparency log	Complementary · wrap a PRML hash inside a Sigstore attestation
Model Cards	Free-form model documentation (intended use, limits, evaluation results)	Complementary · cite a PRML hash in the evaluation section
HELM	Curated benchmark suite with standard metrics	Complementary · HELM run-metadata can include a PRML commitment
OSF / ClinicalTrials.gov	Centralised study pre-registration (with state-anchored timestamps for trials)	Conceptual analogue · PRML is the decentralised ML version, the hash itself is the anchor
NeurIPS / MLRC checklists	Author-completed protocol declarations (what was done, what was reported)	Complementary · checklists describe, PRML proves

vs in-toto

in-toto is a framework for cryptographically capturing software supply-chain steps — which person or builder ran which step in which order, and what artifacts flowed between them. It originated at NYU SSL and has matured into widely used industrial tooling.

The overlap is conceptual: both produce signed records of "this happened before that." The difference is target. in-toto records build steps; PRML records evaluation claims. A PRML manifest could plausibly appear as an input artifact or output product within an in-toto step record, but PRML doesn't try to capture the build pipeline that produced the model — only the claim about how the model was supposed to perform.

PRML target

"This evaluation claim — threshold, metric, dataset split — was fixed before the model ran."

in-toto target

"This binary was produced by this builder running these steps in this order, signed."

Together

PRML inside an in-toto layout = full chain from build to commitment.

vs SLSA

SLSA (Supply-chain Levels for Software Artifacts) is a Google-originated framework that defines provenance levels (L1-L4) for build trustworthiness. It standardises what a build system has to attest to in order to claim each level.

SLSA answers "is this binary trustworthy?". PRML answers "was this evaluation claim fixed in advance?". The two address different stages: SLSA validates the artifact's origin; PRML validates the protocol bound to the artifact's evaluation. A model produced by a SLSA L3 pipeline can have its evaluation claim PRML-committed; both attestations apply, neither replaces the other.

vs Sigstore

Sigstore signs and time-stamps software artifacts using OIDC-backed identities and a transparency log (Rekor). It removes the operational pain of long-lived signing keys.

Sigstore is identity-attested signing; PRML is content-addressed commitment. They operate at different layers and compose cleanly: a PRML manifest hash can be wrapped inside a Sigstore attestation so the receipt has both a content-anchor (the SHA-256) and an identity-anchor (the signer). For institutional clients that want both "this claim was committed in advance" and "this organisation took responsibility for committing it", Sigstore + PRML is a natural pairing.

vs Model Cards

Model Cards (Mitchell et al., 2019) are structured-but-flexible documents capturing a model's intended use, training data, ethical considerations, and evaluation results. The Hugging Face implementation makes them widely available; the Google AI Hub variant is more enterprise-oriented.

Model Cards describe; PRML proves. The "evaluation results" section of a Model Card is a natural surface to embed a PRML hash, turning a textual claim ("our accuracy is 0.76 on ImageNet-Val") into a re-derivable receipt ("here is the SHA-256 we committed to before running, and here is the permalink anyone can verify against"). The two are not in conflict — PRML is the cryptographic anchor that gives the Model Card's evaluation section verifiability.

vs HELM

HELM (Holistic Evaluation of Language Models, Stanford CRFM) is a curated benchmark suite that evaluates LLMs across many scenarios with standardised metrics. The HELM dashboard publishes per-model scores; the codebase manages the harness.

HELM is downstream of any commitment. PRML is upstream: it commits which scenarios, metrics, and thresholds a user is binding themselves to before running the harness. A HELM run-metadata payload could optionally include a PRML hash; the two solve different parts of the eval rigor surface (HELM = comprehensive eval coverage; PRML = pre-run commitment receipt).

vs OSF / ClinicalTrials.gov pre-registration

The closest conceptual analogue. ClinicalTrials.gov (US government) and OSF (Open Science Framework, non-profit) both implement centralised pre-registration — researchers submit study designs in advance, a registry timestamps them, and reviewers can later verify that the published study matches the registered design.

PRML is the decentralised ML version. Three differences:

No central authority. The SHA-256 hash itself is the anchor. It can be committed to registry.falsify.dev, a public git repo, an arXiv preprint, an in-toto layout, a blockchain — wherever the publisher wants. Anyone can re-derive the hash from the manifest bytes.
Designed for ML eval shape. The 8 fields cover ML-specific commitments (metric, threshold, dataset split, model version) that don't map cleanly onto OSF's clinical-trial-shaped templates.
Cryptographic verifiability. ClinicalTrials.gov relies on the registry being trustworthy. PRML doesn't require trusting the registry — only the hash function (SHA-256) and the manifest bytes themselves.

vs Reproducibility checklists

NeurIPS and MLRC reproducibility checklists are author-completed protocol declarations: yes/no/N-A questions about what the authors did and reported.

Checklists are self-attestation in natural language. PRML is cryptographic record. Checklists capture intent and process; PRML captures a specific commitment that can be re-derived independently. Both are valuable — a paper that completes the NeurIPS checklist and includes a PRML hash in its evaluation section is strictly stronger than one that does either alone.

What PRML does not compete with

PRML deliberately stays narrow:

Not a benchmark. It commits to whatever metric the user picks; it doesn't curate the metrics. HELM, BIG-bench, lm-evaluation-harness, etc. live one layer down.
Not an audit framework. Section 8.1 of the spec acknowledges PRML doesn't solve selective publication. Pre-register ten claims, publish two — PRML can't see that.
Not a compliance certification. EU AI Act, NIST AI RMF, ISO/IEC frameworks define standards; PRML provides one primitive (a tamper-evident eval receipt) those frameworks can cite.
Not a model evaluation service. Studio 11's Diagnostic Sprint is a paid engagement that uses PRML; the spec itself is open and free.

If you're building eval-rigor infrastructure

Most teams will combine three or four of these primitives:

Build / artifact: SLSA + Sigstore for the model artifact itself.
Documentation: Model Card describing intended use, limits, and evaluation strategy.
Evaluation commitment: PRML hash committed before each named claim, embedded in the Model Card's evaluation section and referenced in any paper or dashboard.
Coverage: HELM, BIG-bench, lm-evaluation-harness for breadth.
Reporting hygiene: NeurIPS / MLRC checklist for the paper.

None of these replace each other. PRML is the smallest possible piece; if your stack already covers the others, dropping in a PRML hash is a one-line addition that closes a specific gap (post-hoc threshold tuning) without restructuring anything.

Read the spec →