Diagnostic Sprint · Studio 11 · Engagement v0.1
Lock one of your published eval claims into a tamper-evident receipt — in two weeks.
A short, fixed-scope engagement where Studio 11 authors a PRML manifest for one of your existing evaluation claims, deploys a verifier, and writes the audit report your auditors and reviewers can cite.
Four artefacts you can hand to a reviewer or an auditor.
Every deliverable is yours to keep, share, and re-use. The PRML manifest and the verifier are MIT-licensed; the audit report is plain Markdown with cryptographic citations.
An 8-field YAML committing one published claim — threshold, metric, dataset split, model version, submitter, timestamp — to a SHA-256 hash. Anchored on registry.falsify.dev so anyone can re-derive and verify.
Reference verifier (Python, JS, Go, or Rust — your pick) wired into your CI. Re-runs the claim, computes the canonical hash, exits 0 / 10 / 3 for pass / fail / tamper. No vendor lock-in.
A 6–10 page Markdown report covering: what was claimed, what was committed, how to verify, and §8.1 limitations specific to your claim. Citable in papers, model cards, and notified-body submissions.
A registry.falsify.dev/<hash> page anyone can land on. Shareable in tweets, README badges, and reviewer responses. Optional: keep it private until you publish the underlying claim.
Fixed scope. Fixed price. No discovery loop.
Two weeks, four checkpoints. We pick the claim on day one and freeze the spec on day three — everything after that is verification work, not requirements work.
You walk Studio 11 through three candidate claims you've published or are about to publish. We pick one based on auditability and time-to-rerun. Output: locked claim selection memo.
PRML manifest drafted, reviewed with you, and frozen. Hash computed. Once frozen, the threshold and metric cannot change without invalidating the receipt — that's the whole point.
Verifier wired into your CI or a reference environment. Re-runs the claim end-to-end. If it doesn't reproduce, you find out before the receipt goes public — that's a feature.
Draft of the 6–10 page report sent for review. Includes §8.1 limitations specific to your claim — what the receipt does and does not prove.
Final report delivered. Manifest committed to registry.falsify.dev. Permalink yours to share. 30 minutes of follow-up Q&A reserved for the week after.
A sprint is the right shape for some teams. For others it isn’t.
Honest signal upfront beats a discovery call that ends in a no.
You work with the person who wrote the spec.
No project managers, no associates, no offshore subcontractors. The Diagnostic Sprint is delivered by Cüneyt Öztürk personally.
Author of PRML v0.1 and the four reference implementations (Python, JS, Go, Rust). Studio 11 is a Turkey-incorporated studio working on AI evaluation infrastructure and brand systems. Engagements are structured under English-law SOWs.
If a sprint isn’t the right shape, you’ll know by the end of the intro call. No follow-up sequence, no nurture flow. The intro call is free, scoped to 20 minutes, and ends with a yes-or-no.
Email [email protected] →