Notes · falsify.dev

Short notes on ML eval rigor — in print, not in tweets.

Essays, postmortems, and field reports on pre-registration, the PRML specification, and the wider question of what it would take for ML evaluation claims to be falsifiable in practice. New posts on no schedule. RSS.

No posts yet. The first essay lands when there's something useful to write rather than something topical to react to. The crisis kit (repo) covers the reactive case; this section is for everything that benefits from sitting overnight first.