GitHub →

cohesive-llm-benchmark · explorer

Click any row to see the prompt, ground-truth .nf and the LLM-generated .nf side-by-side.
id category outcome tags n_proc error steps used