Just curious, is there a hidden bias in just having two candidates to select fro...

amoss · 2025-05-20T12:15:53 1747743353

At first glance it looks similar to the Monty Hall problem, but it is actually a different problem.

In the Monty Hall problem there is added information in the second round from the informed choice (removing the empty box).

In this problem we don't have the same two-stage process with new information. If the previous process was fair then we know the remaining candidate was better than the eliminated male (and female) candidates. We also know the remaining female candidate was better than the eliminated male (and female) candidates.

So the size of the initial pools does not tell us anything about the relative result of evaluating these two candidates. Most people would choose the candidate from the smaller pool though, using an analogue of the Gambler's Fallacy.

matsemann · 2025-05-20T13:17:23 1747747043

Yeah, good point. I tried to make an experiment: 1 female, 9 males, assign a random number between 1 and 100 to each of them. Then, checking only the cases where the female is in top 2, would we then expect that female to be better than the other male? My head says no, but testing it in code I end up with some bias around 51-52%? And if I make it 1 female and 99 men it's even greater, at ~64 %.

Maybe my code is buggy.

asksomeoneelse · 2025-05-20T15:20:37 1747754437

I suspect you have an issue in the way you select the top 2 when they are several elements with the same value.

I tried an implementation with the values being integers between 1 and 100, and I found stats close enough to yours (~51% for 10 elements, ~64% for 100 elements).

When using floating point or enforcing distinct integer values, I get 50%.

My probs & stats classes are far away, but I guess it makes sense that the more elements you have, the higher the probability of collisions. And then, if you naively just take the first 2 elements and the female candidate is one of those, the higher the probability that it's because her value is the highest and distinct. Is that a sampling bias, or a selection bias ? I don't remember...

matsemann · 2025-05-21T07:21:12 1747812072

You're correct! When using floats (aka having much less chance for collisions than hundred numbers with hundred participants) it's practically unbiased. Thanks for exploring this with me, a fun little exercise.