How does running it multiple times performs? LLMs are non-deterministic, I think...

		epolanski 3 months ago \| parent \| context \| favorite \| on: Claude Opus 4.1 How does running it multiple times performs? LLMs are non-deterministic, I think benchmarks should be more about averages of N runs, rather than single shot experiments.