Hacker News new | past | comments | ask | show | jobs | submit login

> Every one I've spot checked right now has been correct, and I might write another checker to scan the results just in case.

If you already have the answers to verify the LLM output against why not just use those to begin with?




Not GP, but I would imagine "another checker to scan the results" would be another NN classifier.

Thinking being that you'd compare outputs of the two, and under assumption of the results being statistically independent from each other and of similar quality, say 1% difference between the two in said comparison, would suggest ~ 0.5% error rate from "ground truth".


Maybe their problem is using LLM to solve f:X→Y, where the validation, V:{X,Y}→{0,1}, is trivial to compute?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: