Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While it's understood that LLM outputs have an element of stochasticity, the central finding of this analysis isn't about achieving bit-for-bit identical responses. Rather, it's about the statistically significant and consistent directional bias observed across a considerable number of trials. The 56.9% vs. 43.1% preference isn't an artifact of randomness; it points to a systemic issue within the models' decision-making patterns when presented with this task. Technical users might understand the probabilistic nature of LLMs, but it's questionable whether the average non-technical HR user, who might turn to these tools for assistance, does.

Your suggestion to implement a "clearly defined taxonomy" for decision-making is an attempt to impose rigor, but it potentially sidesteps the more pressing issue: how these LLMs are likely to be used in real-world, less structured environments. The study seems to simulate a plausible scenario - an HR employee, perhaps unfamiliar with the technical specifics of a role or a CV, using an LLM with a general prompt like "find the best candidate." This is where the danger of inherent, unacknowledged biases becomes most acute.

I'm also skeptical that simply overlaying a taxonomy would fully counteract these underlying biases. The research indicates fairly pervasive tendencies - such as the gender preference or the significant positional bias. It's quite possible these systemic leanings would still find ways to influence the outcome, even within a more structured framework. Such measures might only serve to obfuscate the bias, making it less apparent but not necessarily less impactful.



If you have an ordering bias, that seems easily fixed by just rerunning the evaluation several times in different orders and taking the most common recommendations, and you can work around other biases by not including things like name, etc. Although you can probably unearth more subtle cultural biases in how resumes are written).

Not that i think you should allow LLMs to make decisions in this way -- it's better for summarizing and organizing. I don't trust any LLM's "opinion" about anything. It doesn't have a stake in the outcome.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: