cohesive-llm-benchmark · explorer

Click any row to see the prompt, ground-truth .nf and the LLM-generated .nf side-by-side.

corpus category error category outcome tag search

id	category	outcome	tags	n_proc	error	steps used