2026-05-10 RFC ~5 min

v0.2 RFC, briefly — what's open and why.

Five proposals are open for community comment until 2026-05-22 23:59 UTC. v0.1 stays the stable spec. v0.2 will be additive — every v0.1 manifest hashes identically under v0.2 canonicalisation rules. The honest read of what changes, what doesn't, and why the comment window matters more than the proposals themselves.

The five proposals, in one sentence each

P-01 — Streaming variant. An optional prml_mode: streaming for live evaluations (Chatbot Arena Elo, A/B-tested production, drift monitors), where the threshold commits to an aggregation rule applied to a window rather than a fixed batch.

P-02 — Runner attestation. An optional runner_attestation URI to an out-of-band execution attestation (Sigstore, in-toto, TEE). PRML records that an attestation was emitted; it does not interpret the attestation. This narrows — but does not close — the gap between what was claimed and what was run.

P-03 — Revocation. Optional revoked_at + revocation_reason with a small controlled vocabulary (dataset_compromised, model_recalled, author_request, other). The hash continues to verify after revocation; verifiers must surface revocation status separately.

P-04 — Conformance vector format. Standardise the test-vector directory layout and a stdin/stdout runner protocol so any implementation can mechanically prove byte-equivalence with the reference.

P-05 — Patent grant placement. Move the existing patent non-assertion grant from Appendix C into a new §1.5 (preamble). Standards-body reviewers requested this for inclusion in Annex Z reference checks. No textual change to the grant.

What v0.2 doesn't try to do

Three classes of problem stay outside the v0.2 envelope, deliberately:

The non-negotiable

Every v0.1 manifest hashes identically under v0.2 canonicalisation. This is normative: any proposed v0.2 change that breaks v0.1 hash-equivalence for v0.1-shaped inputs is rejected at design time. The 12 v0.1 conformance vectors plus 8 new v0.2 vectors mechanically verify the property — 32/32 vectors pass on every commit through CI.

Backwards compatibility isn't a polite preference. If your reading of v0.2 includes any way that an existing v0.1 commitment hash could change retroactively, that reading is wrong. Tell us — that's exactly what the RFC window is for.

Why the window matters more than the proposals

Five named proposals sound like the substance of an RFC. They are not. The substance is what falls outside the proposals — the questions a reader thinks of while reading them, the gaps a working auditor notices, the mismatch between the spec's framing and a real institution's obligations. We want those.

If you publish ML eval claims professionally, run an audit programme, write papers in eval methodology, or work with EU AI Act Article 12 logging — your position on any of these proposals can directly shape what the v0.2 freeze looks like. The window closes 2026-05-22 23:59 UTC. After that, comments roll into v0.3 unless they identify a security flaw.

Two ways to comment, in roughly increasing seriousness:

  1. GitHub Discussions, label rfc-v0.2: github.com/studio-11-co/falsify/discussions
  2. Email the editor: [email protected] (subject prefix [v0.2 RFC]; preferred for institutional or confidential comments).

The editor reads everything. A two-line comment with one specific concern beats a ten-line comment with five general ones. Vague approval is welcome but doesn't shape the freeze.

What's already shipped alongside the RFC

The infrastructure that makes the RFC accessible has been built out in parallel:

None of that is required to comment. The full RFC text is 1,500 words.


The line we keep

PRML closes one specific gap: it makes post-hoc threshold tuning mechanically detectable. That's a small claim. The spec is fifteen pages and the schema is eight fields. v0.2 adds four optional fields; the core stays small. We are not building a publication-integrity system, an attestation framework, or a benchmark. We are building one primitive that has to be small enough to be implementable in any language and durable enough to outlive any specific tooling.

The proposals are how the primitive gets one degree more useful without ceasing to be small. Tell us where we got that wrong.

Cüneyt Öztürk — co-founder, Studio 11 Turkey Ltd. Şti. · falsify track lead. [email protected]. This note is CC BY 4.0; reuse, fork, edit, cite without attribution if it helps.