Single-turn pass
—
Multi-turn pass (turns)
—
Conv. full-pass
—
silent_no_op rate
—
harness-corrected excludes failures where the model added a
biologically sensible extra step (host depletion, kmerfinder, …) that the
validator didn't know how to parametrise — those are
harness-param-gap in the
examples table, not model errors.
Pass rate — all models
Error category —
Multi-turn by modification kind
silent_no_op rate · species × model (multi-turn corpus — darker = worse)
Examples
| id | kind/cat | species | outcome | error | tags |
|---|