Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Probability Can Bite (2010) (maa.org)
63 points by behnamoh on Aug 21, 2023 | hide | past | favorite | 82 comments


Probability does not bite; describing partial information in English bites.

It's not actually true that the probability is 1/3, nor that the probability is 1/2. (Same with 13/27 vs 1/2). The problem is underspecified. Here's two different more specified versions for which the answer is clear:

1. Sample from all two-child families with at least one boy. What portion of these families have two boys? (answer, rot13: n guveq)

2. Choose a random two-child family, then knock on their door. A boy answers. What are the odds the other child is a boy? (rot13: bar unys)

These are both consistent with the description "at least one child is a boy"!

The day-of-week versions:

3. Sample from all two-child families with at least one boy born on a Tuesday. The odds both are boys? (nyzbfg unys)

4. Knock on the door of a random two-child family. A boy born on Tuesday answers. Odds both are boys? (n unys)


#2 is not actually equivalent to "at least one child is a boy". It is rather equivalent to "the first child is a boy". The difference may seem trivial, but one implies the other without the converse being true. This changes the probabilities — it's not an issue with underspecification.

I think your example #1 makes it much clearer why the 1/3 arises, at least in a frequentist analysis.

I would like to offer a similar interpretation but from a Bayesian lens. The 1/3 as rises due to the artificiality of the knowledge condition. Given real-world constraints, we expect any information collected to cleave neatly between the two children in our imagined information gathering scenario. So we implicitly translate "at least one child is a boy" to "we've checked one child, it's a boy".

Consider the following related problem: I have two faucets next to each other, each has a 50% chance of dripping overnight. I leave one shared bucket under both of them. The next day, the bucket is wet. What's the odds that _both_ faucets dripped?

This setup makes the correlative nature of the information much clearer, and I think most people would be less likely to jump to 1/2 as an answer.


The bucket formulation is very elegant.

I still feel the problem arises from English, not probability. It's clear that "we've checked one child, it's a boy" implies "at least one child is a boy." But furthermore, If someone tells me "at least one the two kids is a boy," I do not know how they arrived at that information. It could either have been through the bucket method or the knock-at-door method.

From a Bayesian perspective, we should consider both as possible with priors P and 1-P (i.e. the answer is somewhere between 1/3 and 1/2). On the other hand, from the perspective of someone taking a math test, I'd rather like the professor to tell me their own prior -- which, given they felt confident enough to put this on a test, they must believe it's basically 0 or basically 1.

Ultimately, both scenarios are describable by the same English phrase, and it feels proscriptivist to just consider one of them, even if it happens to have the least entropy in this case. There should always be the followup question asked: "_how_ did you know this?" and if it's kicked back to " because someone told me," either we need to ask how that person learned it or else bust out some priors.


Thanks for the compliment about the bucket, I was quite pleased with it :)

I do appreciate what you mean about the language issue — it's a misleading phrase that due to the context of the question encourages the listener to jump to "1/2". But it's quite a common expression in probability, and in that context the expression is unambiguous, if difficult to parse (like many things in mathematics, I suppose).


That makes sense.

I agree that it's must be a standard understanding among statisticians that one of these interpretations is implied (although maybe given what happened with the Monty Hall problem, it's not really so standard?). It's legitimately interesting that these two different interpretations result in different answers, but I feel that it is rather confusing to tell an outsider of the field that 1/3 is "the" answer and that their intuitions are wrong -- when actually it's just one conventional interpretation.

The Monty Hall problem is often understated, and for example the "intuitive" answer of 1/2 (i.e. that switching doesn't matter) can be restored if we assume the host himself didn't know where the car was and just happened to reveal another mule by chance. The assumption that the host knows where the car is is often not mentioned explicitly. Now it's just convention that in other such scenarios that there should be a similar understanding.


The way I like to think about the Monty Hall problem is by thinking of switching not as being "switch to another unspecified second door" but rather "switch to the winner among the other two doors, if any of them are winners".


The problem is ambiguous, due to under specification. That means that neither #1 nor #2 is "actually equivalent" to "at least one child is a boy," and more information is needed to construct a probability space.

#1 is "When both genders are known, and boys are preferred in the description, at least one is a boy." The preference is what makes the answer 1/3, and assuming it adds information to the problem.

#2 can be "When only one gender is known, and how we know it is uncorrelated with either possibility, at least one is a boy." But it can also be "When both genders are known, and the description reflects the probability of that gender being chosen at random from the two, at least one is a boy." In both cases, the answer is 1/2.

But being under-specified does not mean the question can't be answered, it just requires applying a reasonable assumption instead of an unreasonable one. #1 is very unreasonable since it adds information, #2 is close, but #3 is best.

And the proof is Bertrand's Box Paradox. That name does not properly refer to a probability problem, it applies to how to make this reasonable assumption.

"Mr. Jones has exactly two children. I have written the gender, of at least one, inside this sealed envelope. What is the probability that both children have that gender?"

If you were to open the envelope, and see the word "boy," the problem becomes the same as the one under discussion. If it can be answered, that answer is correct here as well. But it is an equivalent problem if you see the word "girl," and again the answer must be the same. If 1/3 is an acceptable answer, it means that 1/3 of all two-child families have two of the same gender, and 2/3 have mixed genders.

But that is a contradiction. We know that the split is 1/2:1/2. So the assumption, that 1/3 is a reasonable answer, is disproven. Now, that does not mean that the information came to us via #2 or #3, it just means we can't assume that it was #1.

Most often, the same logic is used for the Monty Hall Problem, it is just applied backwards.


> It's not actually true that the probability is 1/3, nor that the probability is 1/2.

You’re right. Those who are satisfied with the 1/3 answer may want to consider the following.

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that the pair is single-sex.

1/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that the pair is single-sex.

Also 1/3

> I tell you I have two children, and ask you what you think is the probability that the pair is single-sex.

1/2

So if I tell you that I have two children you think that the probability that they are of the same sex is 1/2. And when I tell you the gender of one of them, whatever it is, you will think that the probability goes down to 1/3?


The statement "at least one of them is a boy" (<=> "I don't have two daughters") is a little more subtle than "I tell you the gender of one of them" since the former excludes one out of four possibilities (FF, thus letting us update our belief on the single-sex question to a third) while the latter implies fixing the gender of a specific one of the children (without specifying which one, and in either case the probability of the other being M is still a half, thus not giving us information towards the single sex question).

So if you tell me the gender of a specific one of them, say the youngest, then I haven't learned anything that makes my subjective probability go down that the other is the same gender.

I think in real life you will come across the second kind of statement (e. g. "my oldest is a girl") than the first kind (e. g. "I do not have two boys")

But it does not feel too weird to me that "at least one of them is a girl" will reduce the probability of the pair being single-sex to a third. In fact if you further tell me that both "an least one of them is a girl" and "at least one of them is a boy", the probability of the pair being single-sex will go to zero and this seems perfectly reasonable


Do you agree with the following?

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that the pair is single-sex.

1/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that the pair is single-sex.

1/3

If you don’t, why not?

If you do, what’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that the pair is single-sex.

Will your answer change after you have a chance to check your messages?


In this scenario, you are subtly changing the meaning of "single-sex".

In the first two cases, "single-sex" means "the same specific sex as the child you know the sex of" whereas in the last case it means "the same sex as a child that can still have two possible sexes".

If you would say,

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that the pair are both girls?

and then follow up with another question,

> what you think is the probability that the pair are both boys?

and then add the two probabilities up equally weighted, you might see why 1/2 is the reasonable answer in that case.

(And why opening up the email in question would reduce the probability of one of the questions to 0, and the other to 1/3.)


As you find “both girls or both boys” problematic for some reason maybe we can discuss the following questions instead - where hopefully there is no subtle change of meaning.

————-

Do you agree with the following?

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have one boy and one girl.

2/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that I have one boy and one girl.

2/3

If you don’t, why not?

If you do, what’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.


I’ll come back to your reply later, but I would appreciate it if you could give a precise answer to the questions I asked.

It may help to find a common understanding on top of which we can build a clear discussion of the subtleties involved.


There are four cases to consider, MM (both kids are Male), FF (both kids are Female), MF and FM. So there's a 50% chance of same gender kids and a 25% chance for both kids to be female. So if you know the gender, say female, of one kid but not if they are the older or younger, you have these possibilities FF, FM or MF. And FF is 1/3 of that.


If I understand correctly what you said:

If I tell you that one kid is male, you think that the probability that there is one male and one female is 2/3.

If I tell you that one kid is female, you think that the probability that there is one male and one female is 2/3. (Right?)

If I don't tell you anything - beyond the fact that I have two kids - what's the probability that there is one male and one female?


There are four equally likely combinations (under the [both false!] assumptions of equal and independent sexes for children in the same family): MM, FM, MF, and FF; if you know that there is at least one male (or at least one female) you eliminate one of those possibilities, leaving the relative probabilities of the other three still equal.

So, knowing no additional information, the chance of one male and one female is two-fourths, or one-half.

Knowing that there is at least one male (eliminating FF), or at least one female (eliminating MM), the probability of one male and one female is 2/3.

If you know the sex and birth order of one, you eliminate two possibilities, retaining the relative probabilities of the remaining ones as equal, so if you know the first is male, eliminating FM and FF, then the probability of one male and one female is 1/2 (and similarly, mutatis mutandis, with other sex and birth order combinations, which produce the same result eliminating different pairs of possibilities.)


> Knowing that there is at least one male (eliminating FF), or at least one female (eliminating MM), the probability of one male and one female is 2/3.

Don't you always know that there is at least one male or one female?

I mean, if A="there is at least one male" and B="there is at least one female" you're telling me that if you know that A holds the probability is 2/3 and if you know that B holds the probability is 2/3.

But, knowing no additional information, you KNOW that A and/or B holds!

What’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.


> Don't you always know that there is at least one male or one female?

Knowing that there is at least one male or at least one female eliminates zero possibilities.

Knowing that there is at least one male or knowing that there is at least one female eliminates one possibility (a different one for each case, but the difference is immaterial to the probability of a mixed pair).

> What’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.

1/2

And if you know you will be told the sex of one child, with equal probability as to which the probability remains 1/2 when you are told, even though knowing without that constraint on how you will know makes it 1/3.

Because then the possibilities are (assume you are told “male”)

MM, told birth order 1

MM, told birth order 2

MF, told birth order 1

FM, told birth order 2


>> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.

> 1/2

So you say it's 1/2 even though you know that as soon as you read the message you will update it to 2/3. Is that right?

The message says that (at least) one of them is a girl or that (at least) one of them is a boy. In either case, you state that the correct probability is 2/3.

Why not say 2/3 already then?


> So you say it's 1/2 even though you know that as soon as you read the message you will update it to 2/3. Is that right?

Nope, its 1/2 afterwards, too.

Because you knew you were going to get one, and that is additional information; for much the same reason as the scenario outlined after that in GGP, it is twice as likely that the one you would get whichever one you do get if it was not a mixed pair.


I don't get it.

If I say that I have two children and (at least) one is a boy you will say that the probability that I have one boy and one girl is 2/3.

If I say that I have two children and (at least) one is a girl you will say that the probability that I have one boy and one girl is 2/3.

If I say that I have two children and (at least) one is a [unintelligible] will you say that the probability that I have one boy and one girl is 1/2 or 2/3?

Edit: If you're going to say that [unintelligible] could be anything other than boy or girl - and are not willing to assume that it has to be one or the other consider the following alternatives:

If I say that I have two children and (at least) one is a [word in a foreign language that you know that means girl or boy but you don't remember which one] will you say that the probability that I have one boy and one girl is 1/2 or 2/3?

If I say that I have two children and (at least) one is [of the same sex as the first child of some other person that you don't know] will you say that the probability that I have one boy and one girl is 1/2 or 2/3?


Your math is accurate. Once you are told the gender of one child with no other information, the odds of being all the same gender go down. Probability is tricky.


> I have two children…

Oh, you have two children? The probability that they are of the same sex is 1/2.

> and the sex of at least one of them is…

Say no more! If at least one of them is of some sex the odds that they are both of the same sex go down to 1/3.

I said 1/2 before but that was before knowing that at least one of them is either a boy or a girl. That changes everything! (Probability is tricky.)


Nice!

What's really fun about this problem is that you can have very convincing arguments for 1/2 being the correct answer, and very convincing arguments for 1/3 being the correct answer. And for either you can make subtle reformulations that supposedly illustrate how ridiculous this answer is.

And there is no way to know. There is no gold standard for designing an experiment that would show whether 1/2 or 1/3 is correct. You could set up something that generates millions of pairs of (virtual) kids and then count the pairs that fit. But each of these experiments will have built-in the assumption on which the response is ultimately already predicated on.

The only thing really convincing would be if everybody, all "sides", could agree on an experiment with an outcome that they would feel bound to. Then one could settle this once and for all, whether it's 1/2 or 1/3 or 13/27 or 729/1459 or whatnot. But people will never agree on such an experimental setup.

Which tells me that this is not a mathematical problem. This problem is either underspecified or it's contradictory. If it was uniquely specified then we could just use probability theory with its axioms and inference rules to derive at the correct answer. But we obviously can't, since nobody can agree on how to formally note this down.


> If it was uniquely specified then we could just use probability theory with its axioms and inference rules to derive at the correct answer.

You’re right.

I wrote in another comment the solution down to this two unspecified elements:

P(you tell me that you have two children including at least one boy | you have two boys)

P(you tell me that you have two children including at least one boy | you have one boy and one girl)

If one assumes that they are equal (why?) the answer is 1/3.

If one assumes that the latter is half as probable the answer is 1/2.

Whatever the assumption that one finds more natural the point is that an assumption is needed.


Any arguments for 1/2 are just wrong. This isn't an unknowable or undefined situation. It's counterintuitive, but that's different.


Do you agree with the following?

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have one boy and one girl.

2/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that I have one boy and one girl.

2/3

If you don’t, why not?

If you do, what’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.


I agree 2/3, I agree 2/3. The answer to the last question is 1/2.


Thanks for your answer.

What I don’t understand is that when you read the content of that email you will find yourself in either the first situation (I told you that I have two children and that (at least) one of them is a boy) or the second situation (I told you that I have two children and that (at least) one of them is a girl).

In both cases the probability will be 2/3 so why wouldn’t you conclude that the probability is 2/3 without waiting to find out the (irrelevant) details?


The odds only sound equal/the details irrelevant because you are only looking at one outcome from the set. In reality, the email will resolve the probabilities to: (BB: 1/3 BG:2/3 GG:0/3} or {BB:0/3 BG:2/3 GG:1/3}. Although the BG values are the same, the rest of the probabilities are not. Therefore, the details are relevant.

I don't have a great explanation as to why that's intuitively true, but it is. I can try again if things are still confusing. But if so it would help to know if you understand the Monty Hall problem.


The odds don’t “sound equal”. According to you, they are equal (2/3).

Saying

“before opening the email I think the probability that you have one boy and one girl is 1/2 but one of two things will happen, I either find that you have at least one boy and I will conclude that the probability that you have one boy and one girl is 2/3, or I will find that you have at least one girl and I will reach the same conclusion”

is like saying

“under this cup there is either a dime or a quarter, it’s a dime the probability of heads is 1/2 and if it’s a quarter the probability of heads is also 1/2”

and claiming that the probability of heads before I tell you whether it’s a dime or a quarter is something other than 1/2 and changes always to 1/2 when I let you know what it is.

I understand the Monty Hall problem. I also understand this one.

I wrote a detailed solution here https://news.ycombinator.com/item?id=37206445 making clear the additional assumptions needed to make the solution of original problem 1/3.

With those assumptions the probability that there are a boy and a girl are 2/3 if I tell you that there is at least a boy and 0 if I tell you that there is at least a girl. The probability that the email says that I have at least a boy are 3/4 (I would only say that I have a girl if I didn’t have any boys). You can calculate the probability that I have one boy and one girl before opening the email as 3/4 * 2/3 + 1/4 * 0 and it equals 1/2 as it should.


You need to look at the odds for all events. You cannot just look at the odds for a just specific event for deciding that the specified gender in the email is irrelevant. The fact that the rest of the odds are different means that it's 1/2 when the email is sent.

Your coin question is totally different. Whether the coin is heads or tails is independent from which coin it is. Whether you mention you have at least one boy is not independent of the gender of the children.

Your last paragraph has correct math. But the math works equally well with "specify a girl if you have one" or "flip a coin and use a random kids gender"


> Your last paragraph has correct math. But the math works equally well with "specify a girl if you have one" or "flip a coin and use a random kids gender"

That’s the point.

The math works well with "specify a boy if you have one" and then the answer to A [I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have one boy and one girl.] is 2/3 and the answer to B [I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that I have one boy and one girl.] is 0.

The math works well with "specify a girl if you have one" and then the answer to A is 0 and the answer to B is 2/3.

The math works well with "flip a coin and use a random kids gender" and then the answer to A is 1/2 and the answer to B is 1/2.

If every parent with two kids says either “at least one is a boy” or “at least one is a girl” there is no way to make the math work so the answer to A is 2/3 and the answer to B is 2/3.

——-

As I explain in another comment for that the two following conditions need to be met:

P(you tell me that you have at least one boy | you have two boys) = P(you tell me that you have at least one boy | you have one boy and one girl)

P(you tell me that you have at least one girl | you have two girls) = P(you tell me that you have at least one girl | you have one boy and one girl)

There are ways to make the math “work”. For example: if you have two boys or two girls flip a coin and if you get heads talk about the weather, if you get tails say [I have two kids and at least one is a boy/girl], if you have one boy and one girl say [I have two kids and at least one is a …] using a coin flip to decide if you say “girl” or “boy”.

However, they seem quite unnatural and hardly a justification to claim that “any arguments for 1/2 are just wrong.”


You misunderstand me.

No matter how you choose the statement "I have at least one (girl/boy)", (prefer one, flip a coins, etc) once you tell me it's always 2/3 boy-girl. Until you tell me it's 1/2. Any algorithm to choose which to say works as long as it's true and you don't convey more information about the children like "my older child is male".

Your counter arguments are wrong, but you don't seem to even acknowledge that I am saying that. I'm willing to try to explain why, but not if you don't want to learn and just want to insist you are correct. Ask yourself how long you would spend trying to explain Monty Hall to someone who kept insisting it was 1/2 to change.


> Your counter arguments are wrong, but you don't seem to even acknowledge that I am saying that.

I do acknowledge that you're saying that I'm wrong. That's why we're still exchanging arguments! What I don't know exactly is what do you think that it's wrong with my arguments so I try to find where do we agree - and where we don't.

Do you think that there is something wrong with the answer I wrote down here https://news.ycombinator.com/item?id=37206445 ?

It seems that you don't agree that the answer depends on the (relative) value of P(you tell me that you have at least one boy|you have two boys) and P(you tell me that you have at least one boy|you have one boy and one girl).

> I'm willing to try to explain why, but not if you don't want to learn and just want to insist you are correct.

Well, I could also say that just want to insist that my arguments are wrong but I sincerely hope that you want to learn as much as I do.

> Ask yourself how long you would spend trying to explain Monty Hall to someone who kept insisting it was 1/2 to change.

As long as needed. Souls are saved one at a time. Here we go.

-----

> No matter how you choose the statement "I have at least one (girl/boy)", (prefer one, flip a coins, etc) once you tell me it's always 2/3 boy-girl. Until you tell me it's 1/2. Any algorithm to choose which to say works as long as it's true and you don't convey more information about the children like "my older child is male".

That's wrong and I'm going to try to show you that with an example (the mathematical proof is in the link above). Hopefully I'm not misrepresenting your position - please tell me if I do.

You are in an auditorium with 600 people. Each of them has two kids. (Let's assume there is no strange thing going on like "meeting of parents with twins" and the sex of the kids is equally probable and independent.)

Q: What's the probability that a given person has one boy and one girl?

A: 1/2

Q: How many of them do you estimate that have one boy and one girl?

A: 300

Each of them write into a paper their name and "I have at least one (girl/boy)" (they never lie and if there is a choice the choose however they want: prefer one, flip a coin, etc.).

You have the 600 papers in front of you, but have not read them yet.

Q: What's the probability that a given person has one boy and one girl?

A: Still 1/2

Q: How many of them do you estimate that have one boy and one girl?

A: Still 300

You can win $100 if you guess correctly whether there are more than 350 or less than 350 people with one boy and one girl.

Q: What's your guess?

A: Less than 350, because my estimate is 300.

Q: What will be the probability that a given person has one boy and one girl after you've read the papers?

A: 2/3 because once they tell me it's always 2/3 boy-girl.

Q: How many of them will you estimate that have one boy and one girl after you've read the papers?

A: 400

Q: Do you want you want to change your guess to "more than 350"?

A: No, until I read the papers the probability is 1/2 and my estimate is that 300 people have one boy and one girl.

Q: So you keep your "less than 350" guess even though you know with certainty that in a few minutes you will estimate that the right answer is around 400 and you will wish you had answered "more than 350"?

A: Yes, I'm happy with that. I think I could get the $100 if I answered "more than 350" now but I refuse to do it until I read the papers.

You read the papers.

Q: What's the probability that a given person has one boy and one girl?

A: 2/3

Q: How many of them do you estimate that have one boy and one girl?

A: 400

Q: Do you want to change your guess for the $100 prize?

A: Yes, now I’d like to answer "More than 350". Thanks for letting me change my guess!

Unfortunately you lose, because in a group of 600 pairs of kids we expected around 300 pairs of boy and girl. Writing things on a paper leaves the children unaffected.


I think now I understand. "I will make you make exactly one of two statements and each results in a 2/3 chance of BG" doesn't make sense. "You freely made one of two statements and each results in a 2/3 chance of BG" does. I interpreted "it depends on randomness" as "speak up if you have at least one boy" or "speak up if you have at least one girl", each of which would result in 450 saying something in your example and the math works.

So it comes down to if we decide on the question (however we do that) before we look at the kids or after.


> I interpreted "it depends on randomness" as "speak up if you have at least one boy" or "speak up if you have at least one girl", each of which would result in 450 saying something in your example and the math works.

The point is that original problem says "I tell you I have two children and that (at least) one of them is a boy". It doesn't say "I tell you I have two children and [when you ask me to confirm whether (at least) one of them is a boy] that (at least) one of them is a boy".

Reasoning from the cases "BB", "BG", "GB", "GG" - and discarding the last one to get p(BB)=1/3 - is implicitly using the cases "BB and I tell you that at least one of them is a boy", "BG and I tell you that at least one of them is a boy", "GB and I tell you that at least one of them is a boy", "GG and I tell you something else like at least one of them is a girl".

That breaks the symmetry between the "I tell you that at least one of them is a boy" and the "I tell you that at least one of them is a girl" problems. Using the cases in the previous paragraph the answer for the former is p(BB)=1/3 but the answer to the latter is p(GG)=1.

If you want to have the same solution when you switch girl <-> boy ["one of them is a boy, what is the probability that I have two boys" <-> "one of them is a girl, what is the probability that I have two girls"] you should treat them equally.

A quite natural way to do so would be to consider the eight equiprobable cases (some of them repeated)

    BB and I tell you that at least one of them is a boy
    BB and I tell you that at least one of them is a boy
    BG and I tell you that at least one of them is a boy
    BG and I tell you that at least one of them is a girl
    GB and I tell you that at least one of them is a boy
    GB and I tell you that at least one of them is a girl
    GG and I tell you that at least one of them is a girl
    GG and I tell you that at least one of them is a girl
but then the answer to both problems is 1/2.

You can make the answer to both problems 1/3 but the eight cases that you would need to consider for that are quite unnatural:

    BB and [... discard this line somehow ...]
    BB and I tell you that at least one of them is a boy
    BG and I tell you that at least one of them is a boy
    BG and I tell you that at least one of them is a girl
    GB and I tell you that at least one of them is a boy
    GB and I tell you that at least one of them is a girl
    GG and I tell you that at least one of them is a girl
    GG and [... discard this line somehow ...]


I think at this point we agree on the math and disagree on how the original question should be read. You saw an implicit alternative to the statement as "I have two children, at least one is a girl" and assumed in the question the parent was saying exactly one of that or the original statement "I have two children, at least one is a boy", possibly chosen randomly, from among the true answers. I read it as a simple statement true statement with GG just being an undefined state for the statement generation, maybe resulting in nothing being said. We could argue about parsing English, but it seems less interesting which question was being posed if we agree on the math behind each parsing, which I think we do?


> You saw an implicit alternative to the statement as "I have two children, at least one is a girl" and assumed in the question the parent was saying exactly one of that or the original statement

The alternative was made explicit when I asked you what do you think that the probability is for two girls under the alternative.

Given that you think that it's also 1/3 the question is how do you arrive to those 1/3 answers in a coherent way.

> I read it as a simple statement true statement with GG just being an undefined state for the statement generation, maybe resulting in nothing being said.

And with BG and GB being states that generate the statement "I have two children, at least one is a boy".

But then BG and GB cannot generate the statement "I have two children, at least one is a girl".

As a I said there are also ways to justify that the answer to both is 2/3 even though they don't look very nice.

> We could argue about parsing English, but it seems less interesting which question was being posed if we agree on the math behind each parsing, which I think we do?

Maybe we can also agree that this is an undefined situation because the answer depends on how you decide to interpret the question.

More precisely, it depends on your assumptions about the (relative) value of P(you tell me that you have at least one boy|you have two boys) and P(you tell me that you have at least one boy|you have one boy and one girl).


It is not sufficient to know one of them is of some sex. For the probability to be 1/3, you need to be asked what the probability is that one of them is a specific sex, not just any sex.


I think the trickiest part is that the other party willingly shared some information and their motives affect probabilities way more than any math.

I find it easier to think about this problem stated like this: let's say you go around asking people " do you have exactly 2 children and at least one of them is a boy?". What are the odds of them having 2 boys if they answered yes.


All probability questions suffer from the same bias. The Monty Hall problem doesn't work if the person offering the choice has some agency and motives.


Another classic example of "sampling method matters" is that the average arrival time between trains is longer for passengers than the train operator. (Because a randomly selected passenger is more likely to be one of the many waiting for a delayed train, than one who happened to get on an earlier train.)


The only way i know how to work through these types of problems is:

- draw a complete probability tree

- sum the prob. of all branches that are possible given the information (having at least 1 boy born on a Tu)

- sum the prob. of all possible branches we're interested in (2 boys)

- divide interesting sum by possible sum

Then assume I made a mistake, invest the same amount of time to program a simple monte carlo simulation to verify the result.

Fix code and probability tree until they match.

Kind of like double book keeping. Very satisfying because you can switch multiple times between theoretical and empirical approach and thinking.


The simulation code:

  public class Test {  
    public static enum Outcome {
      B_Mo, B_Tu, B_We, B_Th, B_Fr, B_Sa, B_Su, G_Mo, G_Tu, G_We, G_Th, G_Fr, G_Sa, G_Su;
  
      boolean isBoy() {
        return this == B_Mo || this == B_Tu || this == B_We || this == B_Th || this == B_Fr || this == B_Sa || this == B_Su;
      }
    }
  
    public static void main(String[] args) {
      Random r = new Random();
      int count_experiments = 1_000_000_000;
      double count_possible_given_info = 0;
      double count_interesting = 0;
      System.out.println("theoretical (13/27): " + (13.0 / 27.0));
  
      for(int i = 0; i < count_experiments; i++) {
        Outcome child_1 = Outcome.values()[r.nextInt(Outcome.values().length)];
        Outcome child_2 = Outcome.values()[r.nextInt(Outcome.values().length)];
        boolean possible_given_info = child_1 == Outcome.B_Tu || child_2 == Outcome.B_Tu;
        boolean interesting = possible_given_info && child_1.isBoy() && child_2.isBoy();
        count_possible_given_info += possible_given_info ? 1 : 0;
        count_interesting += interesting ? 1 : 0;
      }
  
      System.out.println("count_experiments: " + count_experiments);
      System.out.println("count_possible_given_info: " + count_possible_given_info);
      System.out.println("count_interesting: " + count_interesting);
      System.out.println("count_interesting / count_possible_given_info: " + (count_interesting / count_possible_given_info));
    }
  }
Example output:

  theoretical (13/27): 0.48148148148148145
  count_experiments: 1000000000
  count_possible_given_info: 1.3775374E8
  count_interesting: 6.632153E7
  count_interesting / count_possible_given_info: 0.481449941032454


My kids classroom of about 25 kids had their birthdays on the wall. Before I looked I thought "yeah good chance 2 are on the same day". And indeed 2 kids shared the same birthday :-). How much surprise? Not a lot (when measured in bits).


This article isn't about the birthday paradox though.


For problems like these (and the Bertrand paradox), the answer is almost always: please more formally articulate your sample space.


Indeed, a lot of "probability" is actually "linguistics".

Bertrand Paradox is deeper though -- it's not always obvious when there is more than one plausible sample space; there's good mathematics in studying that.


> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have two boys.

I want to find

P(you have two boys|you tell me that you have two children including at least one boy) = P(you have two boys|you tell me that X)

where for convenience I use the notation X=“you have two children including at least one boy”

I can write

P(you have two boys|you tell me X) P(you tell me X) = P(you tell me X|you have two boys) P(you have two boys)

Assuming that you don’t lie I know that X is true and I can restrict the analysis accordingly.

P(you have two boys|X and you tell me X) P(you tell me X| X) = P(you tell me X|X and you have two boys) P(you have two boys|X)

That conditioning information is redundant in some terms but not always.

If the probability of a boy is 1/2 and there is no correlation we can find that P(you have two boys|X)=1/3

(Using the notation girl=not-boy we can also write P(you have one boy and one girl|X)=2/3)

But we’re after something that could be different: P(you have two boys|you tell me that X)

P(you have two boys|you tell me X) = P(you tell me X|you have two boys) P(you have two boys|X) / P(you tell me X| X) = P(you tell me X|you have two boys) P(you have two boys|X) / ( P(you tell me X|you have two boys) P(you have two boys|X) + P(you tell me X|you have one boy and one girl) P(you have one boy and one girl|X) )

Without additional assumptions we can’t go beyond

P(you have two boys|you tell me X)

being equal to

1/3 P(you tell me X|you have two boys)

divided by

1/3 P(you tell me X|you have two boys) + 2/3 P(you tell me X|you have one boy and one girl)


Another way to phrase of your point:

If you consider it possible they might have told you "I have two children and (at least) one of them is a girl" instead of the statement about a boy (were they to have both a boy and a girl), the reasoning given in the article is wrong.


Anyone more if bayes rule can apply? I believe it gives the correct answer

P(two boys | atleast one) = P(atleast one boy | two boys) * P(two boys)/P(atleast one boy)

= 1*(1/4)/(3/4) = 1/3


> I tell you I have two children, and (at least) one of them is a boy born on a Tuesday. What probability should you assign to the event that I have two boys?

> My initial reaction was that the information about the Tuesday was irrelevant, since at issue was gender, not day of birth. In which case, this was the same problem as the one I just described, and the answer would be 1/3. ... The correct answer to the new puzzle is 13/27, just slightly less than 1/2, and not at all close to 1/3.

This does not feel coherent on a gut level, and smart people seem to have worked it that this is true but: the answer to the new problem is the same regardless of the day of week you are told. Can't you make that inference then that the answer to the original question is actually 1/2?


what works for me is to think of it like the mob of people is already there in front of you, and you're first grabbing out just a subset of the mob based on some conditions, then checking only those qualified at random which is the same as checking the distributions inside the qualified group. so for problem 1, the mob is all parents with 2 children, each child is either b or g. the underlying distribution is 25%bb, 25%gg, 50%bg.

you first cull the set by saying "only parents with at least 1 b, line up to be examined". i think it's clear that the selected population will be 1/3 bb parents, 2/3 bg parents, right?

on the other hand the second problem is you saying: "all parents with a b born on tuesday, report to be examined!". note that most of the parents that were in the question 1 selected population are now excluded. try to imagine which parents get selected by this one.

you can't make the inference back to the original question because it's a different question about a broader group of people. question 1 population distrubution actually is 1/3 vs 2/3. the key is to think of it as selecting different subsets of the parent mob.


> on the other hand the second problem is you saying: "all parents with a b born on tuesday, report to be examined!". note that most of the parents that were in the question 1 selected population are now excluded. try to imagine which parents get selected by this one.

But we expect about the same number of parents to show up for "all parents with a b born on Tuesday" as "all parents with a b born on Monday." This feels like some kind of Simpson's paradox I've invented where every subgroup has a the same outcome, which is different than the outcome of the group as whole.


If you have 2B you might show up in both groups (Monday and Tuesday in your example), if you have only 1B you can't show up in both groups, that's why the probability is different.


true but note that the day of week selector is knocking out 6/7ths, it's quite a dominant excluder of parents. thus countering the effect of "has one boy already" on the composition of the group


This is twisting my brain a bit -- there's a third possible version, where the only info you have is the day of week. Something like "i have a child, one of whom was born on tuesday. what are the odds the other is a boy?" perhaps.

And the answer there has to be basically 50:50. So when you combine these two filters, day of week wins somehow. Will have to ponder more


ah i didn't actually explain it, the other responder did though. 6/7th (~85.7%) of parents with b g are knocked out by the day of week filter, while only ~%73.5 of the b b parents are knocked out by it. (if i did that math right) so proportionally the sample has more bb parents than the one without day of week.


And what makes probability so hard, is that these word problems describe a situation that never happens in real life, so human-language speaking people interpret the problem as something more sensible, before doing the math


>I tell you I have two children, and (at least) one of them is a boy born on April 1. What probability should you assign to the event that I have two boys? If you think that is going to be too cumbersome, simply tell me whether the probability is close to 1/2 or to 1/3, or to some other simple fraction, and provide an estimate as to how close.

I am going to not heed the author's suggestion of setting up the problem correctly and use intuition:

For the week problem it was 14+13 = (7 * 4 -1) in the bottom and 13 (7*2-1) on the top.

So for a normal year (365 days), it would be (365*2 - 1)/(365*4 - 1). Close to 1/2.


Good intuition. The answer is, indeed, (2-p)/(4-p), where p is the probability that the kid in question is born in the time period in question, ie 1/7 or 1/365, as the case may be.

So, you’re correct.

One can pump the intuition further: If you only know that there are no girls, the result is 1/3. If a boy opens the door, or you know that the older one is a boy, the result is 1/2.

So identifying one of them as the boy increases the result from 1/3 to 1/2.

By saying “one is a boy born on Tuesday”, you give some information, thus nudge the result a bit toward 1/2. Saying “one is born on 12/30”, say, you identify the kid more and nudge the result nearly all the way towards 1/2.

And indeed, if (in the formula above) we set p=1 (no identification), 1/3 results, and if we set p=0 (full identification), 1/2 results.

Quite neat.


I was lazy so I just did:

import itertools as itools

xs = [f'{i}-{j}' for i in range(30) for j in range(12)]

ys = ['b', 'g']

zs = list(itools.product(xs, ys))

boys = [z for z in zs if z[1] == 'b']

len(boys)/len(zs)

----

gives me 0.5, or 1/2.

The moral of the article is to set up your universal set correctly and do not throw out any information, now matter how irrelevant.


Now I tell you I have two children, and (at least) one of them is a boy born on April 1st, has medium-length black hair, is 4'2", interested in 5 out of 12000 available rpg games, has 8 close friends and sword-shaped fingernails.

I believe your script may require a datacenter and a scalable architecture now.


And what do you do with non-discrete attributes?


But that's exactly what you did! You just counted which fraction of all kids is boys. You threw away that this is about pairs of kids, about april 1 on which one of them has their birthday and is a boy, and we want to know the sex of the other one. You already simplified by applying symmetry reductions. You could have just done print(1/2) and proclaimed that the output was 0.5 which therefore must be the solution.


But in this case the moral of the article is exactly the opposite - if you throw out the irrelevant information the answer becomes trivial (no need for your calculations).


> So now you know the possible gender combinations are BB, BG, GB

To me, this is where the intuitive answer of 50% diverges from the calculated answer.

I think intuitively, many of us break it up into 3 equally likely, unordered groups: 'BB', 'GG' and 'BG'. We are then told to exclude 'GG', which leaves us with two equally likely options: 'BG' or 'BB'.

This reminds me of the most recent episode of numberphile deals with the sleeping beauty paradox, where Tom Crawford shows Brady that the chance of flipping heads with a coin is 1/3 to which Brady protests that Tom has changed what he is measuring.

With the gender question, we just assume the wrong probability space, which isn't the same, but my mind still wandered to numberphile.

https://www.numberphile.com/videos/sleeping-beauty-paradox


> I think intuitively, many of us break it up into 3 equally likely, unordered groups

Right, another way to phrase it:

1. We casually move from permutations to combinations, especially when nothing else in the question is there to keep us focused on ordering.

2. When throwing out ordering, we collapse BG and GB together... but forget that the result needs to have double weight.


sleeping beauty

I didn’t buy it either. They claimed both methods are valid, but these are valid for different questions. One of the weaker numberphile videos imo, unless there was something after the point I stopped watching at.


> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have two boys. Many people, when they hear this puzzle for the first time, give the answer 1/2, reasoning that there is an equal likelihood that my other child is a boy or a girl. But this is not correct. Based on what you know, you should conclude that I am actually twice as likely to have a boy and a girl as I am to have two boys. So your right answer to my question is not 1/2 but 1/3.

Actually, neither is correct – leaving aside whether the math is correct given the unstated assumptions – because the unstated assumptions are wrong: birth probabilities by sex are not equal and, within the same family, are not independent.


> many people have about problems such as this, which are about what is known as epistemic probability

Funnily enough, I'm currently reading _Against the Gods_ (1996) and the chapter I'm on covers exactly this distinction. What are the odds?!?!

(If it helps, today is Sunday)


How many children do you have, their genders and the day of the week they were born?


I don’t think it’s necessary to invoke epistemic probability to explain why the original question is tricky. At least for me, it helps to notice that if the question were rephrased “the older child is a boy, what is the probability that the younger child is also?”, then the answer would be the intuitive 1/2. You get into this counter-intuitive situation with the question as asked because of the “at least” phrasing - the remaining possibilities in the contingency table can still apply to one, the other, or both children.


If someone has at least one boy, it is 1/3

If they then say, I have a secret child-ordering X, such that the first in this ordering is a boy, I guess it is now 1/2

However the receiver of 1/3 case knows there exists a secret child-ordering where this is the case.

Weird!


I had a think about this since, and it must have everything to do with experimental setup, and the existence of the observer in the experiment.

Let's set it up like this: You have 4 dads. Each dad has 2 kids covering all the combinations of kids: MM, MF, FM, FF.

Now if you define the experiment as: you are talking to a dad who has one boy. This was set up by the 4 dads being in the playground, and you walk you to one of them randomly and they tell you this. Now in such an experiment there is a 1 in 4 chance that you lead to a contradiction. Therefore in the 4 alternative universes we disband that option, leading to 3 universes. All equally likely. And in one of them the dad has 2 boys.

If the dad tells you the older kid is a boy then you have some info. What info is that? Well it depends on the precondition under which he told you that. At least a couple of options.

(a) He was going to always reveal to you a kid that was a boy, then tell you if it was the older or younger kid. It happens to be the older one this time.

(b) He was going to tell you the gender of the oldest kid.

In version (a) there are no universes discarded, so you are still at 1/3

In version (b) the universe where the older kid is not a boy has been discarded. There are now 2 universes in which you exist so it is 1/2.


Confusion about the Monte Hall problem isn't an issue with people not understanding probability, it's that the problem is usually presented in a way that's under-explained and with hidden assumptions:

https://statisticsblog.com/2011/11/23/monte-hall-revisited/


The best way I've seen the monty hall problem explanation improved upon was by increasing the number of doors. The difference between a 1/2 and 1/3 chance is so subtle to a human. Change it to 1,000,000 doors and suddenly it becomes very obvious why your odds improve on swapping.


Tbh it was never very obvious to me why we reveal N-2 goats instead of just 1.

My own explanation is that if you are shown a goat door in advance, there is 50/50 of two remaining doors. But since without that you might have picked the door that was revealed, there’s a symmetry about it in which the swap is more lucky. Which is just a clumsy way of saying your chances were 2/3 lose vs 1/3 win and it doesn’t change after the reveal, so you must swap.


This actually helped. I always assumed that the goats/prize were randomly placed, Monty's Door selection was random, and the contestant chose a random door. In which case I could never fathom why you'd need to switch. Since as the author says, there's no new information about the remaining two doors, and switching or not switching remain random.

I still feel like I don't full grasp the actual problem/solution though. Even with this new understanding. I will read the linked article by Jeffrey S. Rosenthal though, and hopefully that will fill in the last of the blanks.


To be fair, someone who understands the full problem would be able to explain both solutions.


This page has now been taken down. Does anyone know why?

Fortunately, I have a copy of the page:

April 2010 Probability Can Bite

Estimating probabilities can be a tricky business. The long running saga of the notorious Monty Hall Problem shows how even mathematically-smart people can easily be misled. (For my forays into that particular example, see my Devlin's Angle columns for July-August 2003, November 2005, and December 2005.) Another probability question that causes many people difficulty is the children's gender puzzle: I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have two boys. Many people, when they hear this puzzle for the first time, give the answer 1/2, reasoning that there is an equal likelihood that my other child is a boy or a girl. But this is not correct. Based on what you know, you should conclude that I am actually twice as likely to have a boy and a girl as I am to have two boys. So your right answer to my question is not 1/2 but 1/3. Before I explain the answer, I should clear up a confusion that many people have about problems such as this, which are about what is known as epistemic probability. The probability being discussed here is not some unchangable feature of the world, like the probability of throwing a double six with a pair of honest dice. After all, I have already had my two children, and their genders have long been determined. At issue is what probabilities you attach to your knowledge of my family. As is the case with most applications of probability theory outside the casinos, the probability here is a measure of an individual's knowledge of the world, and different people can, and often do, attach different probabilities to the same event. Moreover, as you acquire additional information about an event, the probability you attach to it can change. To go back to the original puzzle now, in order of birth, there are four possible gender combinations for my children: BB, GG, BG, GB. Each is equally likely. (To avoid niggling complications, I'm assuming each gender is equally likely at birth, and ignore the possibility of identical twins, etc.) So, if all I told you was that I have two children, you would (if you are acting rationally) say that the probability I have two boys is 1/4. But I tell you something else: that at least one of my children is a boy. That eliminates the GG possibility. So now you know the possible gender combinations are BB, BG, GB. Of these three possibilities, in two of them I have a boy and a girl, and in only one do I have two boys, so you should calculate the probability of my having two boys to be 1 out of 3, namely 1/3. If you haven't come across this before, it might take you some time to convince yourself this reasoning is correct. I long ago got past that stage, and hence felt my intuitions would be pretty reliable when I recently came across the following variant of the puzzle. I tell you I have two children, and (at least) one of them is a boy born on a Tuesday. What probability should you assign to the event that I have two boys? Before you read further, you should perhaps pause and try to figure this out for yourself. My initial reaction was that the information about the Tuesday was irrelevant, since at issue was gender, not day of birth. In which case, this was the same problem as the one I just described, and the answer would be 1/3. But then I began to have second thoughts. I admit my doubts were occasioned by the way I came across the problem: a Twitter feed by the well-known mathematician John Allan Paulos, forwarding a Tweet from the (British) Guardian newspaper science-writer Alex Bellos, who was reporting on the posing of this problem at the recent "Gathering for Gardner" conference in Atlanta by puzzle master Gary Foshee. Suspecting that there was more to this problem than I initially thought, I set about repeating the same form of reasoning as in the original puzzle, but taking account of days of the week when my children could have been born. As soon as you do that, you realize that Foshee's problem really is different. But how different? My intuition said that, since the original puzzle had the answer 1/3, the new variant would have an answer fairly close to 1/3. After all, knowing the birth day is a Tuesday may (and does) make a difference, but it surely cannot make much of a difference, right? Wrong. It makes a surprisingly big difference, The correct answer to the new puzzle is 13/27, just slightly less than 1/2, and not at all close to 1/3. This is what really surprised me. To the extent that I checked my solution with the one Bellos published on his blog a few days later. The crux of the matter is that Foshee's variant seems at first glance to be a minor twist on the original one, but it's actually significantly different. The property it focuses on is not gender, but the combination property gender + day of birth. That makes the mathematics very different, as I'll now show. Instead of just the two genders, B and G, of the original puzzle, there are now 14 possibilities for each child:

B-Mo, B-Tu, B-We, B-Th, B-Fr, B-Sa, B-Su G-Mo, G-Tu, G-We, G-Th, G-Fr, G-Sa, G-Su

When I tell you that one of my children is a boy born on a Tuesday, I eliminate a number of possible combinations, leaving the following:

First child B-Tu, second child: B-Mo, B-Tu, B-We, B-Th, B-Fr, B-Sa, B-Su, G-Mo, G-Tu, G-We, G-Th, G-Fr, G-Sa, G-Su. Second child B-Tu, first child: B-Mo, B-We, B-Th, B-Fr, B-Sa, B-Su, G-Mo, G-Tu, G-We, G-Th, G-Fr, G-Sa, G-Su.

Notice that the second row has one fewer members than the first, since the combination B-Tu + B-Tu already appears in the first row. Altogether, there are 14 + 13 = 27 possibilities. Of these, how many give me two boys? Well, just count them. There are 7 in the first row, 6 in the second row, for a total of 13 in all. So 13 of the 27 possibilities give me two boys, giving that answer of 13/27. (As in the original problem, you have to assume all the combinations are equally likely. In the case of birth days, this is actually not the case, since more babies are born on Fridays, and fewer on weekends, due to the desire of hospital doctors to have weekends as free as possible of duties.) What misled my intuition (and likely yours as well) was my unfamiliarity with the property gender + day of birth. Fortunately, the math does not lie. Provided you put your intuitions to one side and set up the problem correctly, the math will give you the right answer. Now that your intuition has been primed, let me leave you with this problem. I tell you I have two children, and (at least) one of them is a boy born on April 1. What probability should you assign to the event that I have two boys? If you think that is going to be too cumbersome, simply tell me whether the probability is close to 1/2 or to 1/3, or to some other simple fraction, and provide an estimate as to how close. (Once more, you should assume all birth possibilities are equally likely, ignoring in particular the well known seasonal variations in actual births.) If you are still having doubts about all of this, take consolation in the fact that you are not alone. Representing real-world problems correctly to calculate probabilities is notoriously difficult. In my recent book The Unfinished Game, cited below, I describe how no less a mathematician than Blaise Pascal had enormous difficulty understanding an analogous argument by Pierre de Fermat.

Follow Keith Devlin on Twitter at Devlin's Angle is updated at the beginning of each month. Find more columns here@nprmathguy.

Mathematician Keith Devlin (email: devlin@stanford.edu) is the Executive Director of the Human-Sciences and Technologies Advanced Research Institute (H-STAR) at Stanford University and The Math Guy on NPR's Weekend Edition. His most recent book for a general reader is The Unfinished Game: Pascal, Fermat, and the Seventeenth-Century Letter that Made the World Modern, published by Basic Books.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: