Diagnostic Sprint · Studio 11 · Engagement v0.1

Lock one of your published eval claims into a tamper-evident receipt — in two weeks.

A short, fixed-scope engagement where Studio 11 authors a PRML manifest for one of your existing evaluation claims, deploys a verifier, and writes the audit report your auditors and reviewers can cite.

Price
€9,000
Duration
14 days
Engineer
1
Express · 7d
+€2,000
What you get

Four artefacts you can hand to a reviewer or an auditor.

Every deliverable is yours to keep, share, and re-use. The PRML manifest and the verifier are MIT-licensed; the audit report is plain Markdown with cryptographic citations.

[I]PRML manifest

An 8-field YAML committing one published claim — threshold, metric, dataset split, model version, submitter, timestamp — to a SHA-256 hash. Anchored on registry.falsify.dev so anyone can re-derive and verify.

[II]Deployed verifier

Reference verifier (Python, JS, Go, or Rust — your pick) wired into your CI. Re-runs the claim, computes the canonical hash, exits 0 / 10 / 3 for pass / fail / tamper. No vendor lock-in.

[III]Audit report

A 6–10 page Markdown report covering: what was claimed, what was committed, how to verify, and §8.1 limitations specific to your claim. Citable in papers, model cards, and notified-body submissions.

[IV]Public permalink

A registry.falsify.dev/<hash> page anyone can land on. Shareable in tweets, README badges, and reviewer responses. Optional: keep it private until you publish the underlying claim.

How it runs

Fixed scope. Fixed price. No discovery loop.

Two weeks, four checkpoints. We pick the claim on day one and freeze the spec on day three — everything after that is verification work, not requirements work.

D1
Claim selection · 90-min call

You walk Studio 11 through three candidate claims you've published or are about to publish. We pick one based on auditability and time-to-rerun. Output: locked claim selection memo.

D3
Manifest draft · spec freeze

PRML manifest drafted, reviewed with you, and frozen. Hash computed. Once frozen, the threshold and metric cannot change without invalidating the receipt — that's the whole point.

D7
Verifier deployed · rerun green

Verifier wired into your CI or a reference environment. Re-runs the claim end-to-end. If it doesn't reproduce, you find out before the receipt goes public — that's a feature.

D11
Audit report draft · review

Draft of the 6–10 page report sent for review. Includes §8.1 limitations specific to your claim — what the receipt does and does not prove.

D14
Hand-off · permalink live

Final report delivered. Manifest committed to registry.falsify.dev. Permalink yours to share. 30 minutes of follow-up Q&A reserved for the week after.

Fit

A sprint is the right shape for some teams. For others it isn’t.

Honest signal upfront beats a discovery call that ends in a no.

Right fit
Wrong fit
You’ve published or are about to publish a numeric eval claim — an accuracy, a refusal rate, a pass-rate — that you’d like to be able to defend.
You need an end-to-end eval pipeline built from scratch. That’s a longer engagement; let’s talk separately.
A reviewer, a regulator, or an internal red-team has questioned whether your threshold was fixed in advance.
You need a managed compliance service. Studio 11 ships an artefact and walks away — we don’t hold ongoing audit retainers.
You can spare ~6 hours of an engineer’s time across two weeks for the verifier integration and one rerun.
You want a generic AI-safety audit. The sprint is narrow on purpose: one claim, one receipt, one report.
Your legal or compliance team is comfortable with an MIT-licensed deliverable and an open spec (CC BY 4.0).
You need the audit kept fully private with no public permalink. Possible, but ask — the public option is the default.
Who delivers it

You work with the person who wrote the spec.

No project managers, no associates, no offshore subcontractors. The Diagnostic Sprint is delivered by Cüneyt Öztürk personally.

Studio 11 · Lead
Cüneyt Öztürk

Author of PRML v0.1 and the four reference implementations (Python, JS, Go, Rust). Studio 11 is a Turkey-incorporated studio working on AI evaluation infrastructure and brand systems. Engagements are structured under English-law SOWs.

◆ Next step
Send a 3-line email naming the claim you’d like to lock. Reply within one working day, every time.

If a sprint isn’t the right shape, you’ll know by the end of the intro call. No follow-up sequence, no nurture flow. The intro call is free, scoped to 20 minutes, and ends with a yes-or-no.

Email [email protected]