Understanding the Birthday Paradox

23 people. In a room of just 23 people there’s a 50-50 chance of two people having the same birthday. In a room of 75 there’s a 99.9% chance of two people matching.

Put down the calculator and pitchfork, I don’t speak heresy. The birthday paradox is strange, counter-intuitive, and completely true. It’s only a “paradox” because our brains can’t handle the compounding power of exponents. We expect probabilities to be linear and only consider the scenarios we’re involved in (both faulty assumptions, by the way).

Let’s see why the paradox happens and how it works.

Problem 1: Exponents aren’t intuitive

We’ve taught ourselves mathematics and statistics, but let’s not kid ourselves: it’s not natural.

Here’s an example: What’s the chance of getting 10 heads in a row when flipping coins? The untrained brain might think like this:

“Well, getting one head is a 50% chance. Getting two heads is twice as hard, so a 25% chance. Getting ten heads is probably 10 times harder… so about 50%/10 or a 5% chance.”

And there we sit, smug as a bug on a rug. No dice bub.

After pounding your head with statistics, you know not to divide, but use exponents. The chance of 10 heads is not .5/10 but .5¹⁰, or about .001.

But even after training, we get caught again. At 5% interest we’ll double our money in 14 years, rather than the “expected” 20. Did you naturally infer the Rule of 72 when learning about interest rates? Probably not. Understanding compound exponential growth with our linear brains is hard.

Problem 2: Humans are a tad bit selfish

Take a look at the news. Notice how much of the negative news is the result of acting without considering others. I’m an optimist and do have hope for mankind, but that’s a separate discussion :).

In a room of 23, do you think of the 22 comparisons where your birthday is being compared against someone else’s? Probably.

Do you think of the 231 comparisons where someone who is not you is being checked against someone else who is not you? Do you realize there are so many? Probably not.

The fact that we neglect the 10 times as many comparisons that don’t include us helps us see why the “paradox” can happen.

Ok, fine, humans are awful: Show me the math!

The question: What are the chances that two people share a birthday in a group of 23?

Sure, we could list the pairs and count all the ways they could match. But that’s hard: there could be 1, 2, 3 or even 23 matches!

It’s like asking “What’s the chance of getting one or more heads in 23 coin flips?” There are so many possibilities: heads on the first throw, or the 3rd, or the last, or the 1st and 3rd, the 2nd and 21st, and so on.

How do we solve the coin problem? Flip it around (Get it? Get it?). Rather than counting every way to get heads, find the chance of getting all tails, our “problem scenario”.

If there’s a 1% chance of getting all tails (more like .5^23 but work with me here), there’s a 99% chance of having at least one head. I don’t know if it’s 1 head, or 2, or 15 or 23: we got heads, and that’s what matters. If we subtract the chance of a problem scenario from 1 we are left with the probability of a good scenario.

The same principle applies for birthdays. Instead of finding all the ways we match, find the chance that everyone is different, the “problem scenario”. We then take the opposite probability and get the chance of a match. It may be 1 match, or 2, or 20, but somebody matched, which is what we need to find.

Explanation: Counting Pairs

With 23 people we have 253 pairs:

$\displaystyle{\frac{23 \cdot 22}{2} = 253}$

(Brush up on combinations and permutations if you like).

The chance of 2 people having different birthdays is:

$\displaystyle{1 - \frac{1}{365} = \frac{364}{365} = .997260}$

Makes sense, right? There’s 364 out of 365 birthdays that are “OK”.

Having all 253 pairs be different is like getting heads 253 times in a row (well, sort-of: let’s assume birthdays are independent). We use exponents to find the probability:

$\displaystyle{\left(\frac{364}{365}\right)^{253} = .4995}$

99.7260% is really close to one, but when you multiply it by itself a few hundred times, it shrinks. Really fast.

The chance that we have a match is: 1 – 49.95% = 50.05%, or just over half! If you want to find the probability of a match for any number of people n the formula is:

$\displaystyle{p(n) = 1 - \left(\frac{364}{365}\right)^{C(n,2)} = 1 - \left(\frac{364}{365}\right)^{n(n-1)/2} }$

Interactive Example

I didn’t believe we needed only 23 people. The math works out, but is it real?

You bet. Try the example below: Pick a number of items (365), a number of people (23) and run a few trials. You’ll see the theoretical match and your actual match as you run your trials. Go ahead, click the button (or see the full page).

As you run more and more trials (keep clicking!) the actual probability should approach the theoretical one.

Examples and Takeaways

Here are a few lessons from the birthday paradox:

sqrt(n) is roughly the number you need to have a 50% chance of a match with n items. sqrt(365) is about 20. This comes into play in cryptography for the birthday attack.
Even though there are 2¹²⁸ (1e38) GUIDs, we only have 2⁶⁴ (1e19) to use up before a 50% chance of collision. And 50% is really, really high.
You only need 13 people picking letters of the alphabet to have 95% chance of a match. Try it above (people = 13, items = 26).
Exponential growth rapidly decreases the chance of picking unique items (aka it increases the chances of a match). Remember: exponents are non-intuitive and humans are selfish!

After thinking about it a lot, the birthday paradox finally clicks with me. But I still check out the interactive example just to make sure.

Appendix A: Repeated Multiplication Explanation (Geeky Math Alert!)

Remember how we assumed birthdays are independent? Well, they aren’t.

If Person 1 and Person 3 match, and Person 3 and 5 match, we know that 1 and 5 match also. The outcome of 1 and 5 depends on their results with 3, which means the results aren’t an independent 1/365 chance (in our case, it’s a 100% chance of a match).

When counting pairs we did math as if birthdays were like independent coin flips, and multiplied probabilities. This assumption isn’t strictly true but it’s “good enough” for a small number of people (23) compared to the sample size (365). It’s unlikely to have multiple people match and screw up the independence, so it’s a good approximation.

It’s unlikely, but it can happen. Let’s figure out the real chances of each person picking a different number:

The first person has a 100% chance of a unique number (of course)
The second has a (1 – 1/365) chance (all but 1 number from the 365)
The third has a (1 – 2/365) chance (all but 2 numbers)
The 23rd has a (1 – 22/365) (all but 22 numbers)

The multiplication looks pretty ugly:

$\displaystyle{p(different) = 1 \cdot \left(1-\frac{1}{365}\right) \cdot \left(1-\frac{2}{365}\right) \cdots \left(1-\frac{22}{365}\right)}$

But there’s a shortcut we can take. When x is close to 0, a coarse first-order Taylor approximation for e^x is:

$\displaystyle{e^x \approx 1 + x}$

$\displaystyle{ 1 - \frac{1}{365} \approx e^{-1/365}}$

Using our handy shortcut we can rewrite the big equation to:

$\displaystyle{p(different) \approx 1 \cdot e^{-1/365} \cdot e^{-2/365} \cdots e^{-22/365}}$

$\displaystyle{p(different) \approx e^{(-1 -2 -3 ... -22)/365}}$

$\displaystyle{p(different) \approx e^{-(1 + 2 + ... 22)/365}}$

But we remember that adding the numbers 1 to n = n(n + 1)/2. Don’t confuse this with n(n-1)/2, which is C(n,2) or the number of pairs of n items. They look almost the same!

Adding 1 to 22 is (22 * 23)/2 so we get:

$\displaystyle{p(different) \approx e^{-((23 \cdot 22) /(2 \cdot 365))} = .499998}$

Phew. This approximation is very close and good enough for government work, as they say. If you simplify the formula a bit and swap in n for 23 you get:

$\displaystyle{p(different) \approx e^{-(n^2 / (2 \cdot 365))}}$

and

$\displaystyle{p(match) = 1 - p(different) \approx 1 - e^{-(n^2 / (2 \cdot 365))}}$

Appendix B: The General Birthday Formula

Let’s generalize the formula to picking n people from T total items (instead of 365):

$\displaystyle{p(different) \approx e^{-(n^2 / 2 \cdot T)}}$

If we choose a probability (like 50% chance of a match) and solve for n:

$\displaystyle{p(different) \approx e^{-(n^2 / 2 \cdot T)}}$

$\displaystyle{1 - p(match) \approx e^{-(n^2 / 2 \cdot T)}}$

$\displaystyle{1 - .5 \approx e^{-(n^2 / 2 \cdot T)}}$

$\displaystyle{-2ln(.5)\cdot T \approx n^2}$

$\displaystyle{n \approx 1.177 \sqrt{T}}$

Voila! If you take sqrt(T) items (17% more if you want to be picky) then you have about a 50-50 chance of getting a match. If you plug in other numbers you can solve for other probabilities:

$\displaystyle{n \approx \sqrt{-2ln(1-m)} \cdot \sqrt{T}}$

Remember that m is the desired chance of a match (it’s easy to get confused, I did it myself). If you want a 90% chance of matching birthdays, plug m=90% and T=365 into the equation and see that you need 41 people.

Wikipedia has even more details to satisfy your inner nerd. Go forth and enjoy.

Appendix C: Try it out!

Plug in your own numbers into the below:

Show {{ filteredItems.length - limit }} more

Other Posts In This Series

April 26th, 2007|200 Comments

200 Comments on "Understanding the Birthday Paradox"

Sort by: newest | oldest | most voted

Herman Hiddema

The math here is actually wrong. The chances of individual pairs are not independent. You math would work if you take each pair and have them name a random number between 1 and 365.

With this math, taking a group of 365 people still results in a non-zero chance that they all have different birthdays.

kalid

Thanks for the info, you’re right. I did some more digging (good paper here) and birthdays aren’t mutually independent.

If Person 1 = Person 3, and Person 3 = Person 5, there isn’t an independent event that Person 1 = Person 5. The probability of 1 matching 5 has already been determined by the other statements.

From what I was able to gather, this is only a problem if there are existing overlapping pairs. For a small n relative to the number of outcomes (365), it’s unlikely to have multiple matches that affect the probability, so assuming independence may be ok for computing approximations.

Too many topics, too little time. » Understanding the Birthday Paradox | BetterExplained

[…] Understanding the Birthday Paradox | BetterExplained: Understanding the Birthday Paradox […]

Carnival of Mathematics Edition #6 at nOnoscience

[…] Regarding your birthday, whether you are savvy with Hamming’s error correcting code or not, listen to Kalid Azad when he presents Understanding the Birthday Paradox posted at BetterExplained in which he explains the Birthday Paradox from statistics. […]

Techniques for adding the numbers 1 to 100 | BetterExplained

[…] Understanding the Birthday ParadoxUnderstanding the Pareto Principle (The 80/20 Rule)Law of Unintended ConsequencesSpeed Up Your Javascript, Part 2: Downloadable Examples!Number Systems and Bases Like it? Share on: […]

Anonymous

The last formula is incorrect, it should be:
n ~ sqt(-2 ln(1-p)) sqt(T)
^^^
or else you are finding the probability to miss.

kalid

Thanks for the tip! I fixed up the article to use p(different) and p(match), which is much more clear.

Pseudonym

The “take-away lesson” about GUIDs is wrong. GUIDs are (theoretically) guaranteed to be globally unique, because they include such things as the MAC address of your network card (something which is globally unique until some cheap NIC manufacturer starts recycling them) and the current time.

The catch is that because of the time factor, the current GUID algorithm won’t last forever. We will run out in a couple of centuries.

Kalid

Hi, that’s a good point about MAC addresses. However, if you consider GUIDs as just a giant random number (for the purposes of the exercise), you are looking for how many “items” out of a pool of 2^128 you can distribute before having a 50% chance of collision.

For the birthday paradox, it’s about 23 items (of a pool of 365) before a 50% chance of collision. For GUIDs, it will be roughly 2^64 items before a 50% chance of collision.

There’s a bit more information here:

http://en.wikipedia.org/wiki/UUID

Hope this helps,

-Kalid

Allan

can the math in the birthday paradox applicable to pick3 lottery?

Kalid

Hi Allan, I’m not too familiar with the rules of Pick3, but I’ll take a shot.

The birthday paradox helps find the chance that any two random numbers will “collide” in a set.

In Pick3, you don’t really care if two guesses collide… you want the guess to collide with the winning number. In this case, two losing tickets that both guessed 123 (when the real answer was 999) isn’t helpful.

I may be missing something though!

n(t)

Hey, great blog.
“” A coarse first-order Taylor approximation for e^x is: \displaystyle{e^x \approx 1 + x}”

that’s just valid if x

n(t)

[…] if x

n(t)

[..] if x is far less than 1

Le blog d'Alex Chauvin

Le paradoxe des anniversaires…

Je suis tombé par hasard (c’est souvent le cas sur Internet de nos jours) sur ce paradoxe des anniversaires qui stipule que dans une réunion regroupant 23 personnes, la probabilité que deux d’entre elles soient nées le même jour est……

Ashton Carr

I am doin a science fair experiment on this i need help–and i need to know if the math is over my head??!!

kalid

@nt: Thanks for the tip, I updated the article to make that more clear.

@Ashton: Hi Ashton, you might want to ask your math teacher to see if you’ve covered the necessary topics in class. You’ll probably need statistics and combinatorics.

zhao

hello kalid,
i read a few of your articles and think they are freaking awesome.

thanks and keep up the good work.

kalid

Hi Zhao, thanks for the comment! I’ll try to keep cranking out the posts :).

Pigeon Birthdays | If Chaos Were Organized

[…] After reading another math explanation on why that’s true, I know that I understand it now. Sure, I might not be able to repeat (or fully understand) the math equations which generate the percentage, but I can identify the bottom line of understanding — when written in POE (plain ol’ English): […]

demi

Heyy ;; i have no clue how to do this!

abc

I think that the math behind this birthday paradox is wrong..
The chance of two people having same birthdays is 1/365 = 0.0027397

therefore p(n)= 0.0027397 ^C(n,2)
if we take an example of 23 people
we get p(23)= 0.0027397 ^ 253 ~=0
so how is it possible??

kalid

Hi, you’re correct 1/365 is the chance of 2 people having the same birthday. However, (1/365)^253 would be the chance of 253 people having the *same* birthday! (Which, as you see, is pretty close to zero).

For this problem, it’s important not to mix up 1/365 (the chance of 1 collision) and 364/365 (the chance of no collision). We first find the chance that somehow, everyone manages to be different:

p(23 people have different birthdays) = (364/365)^253

If there is a 40% chance that everyone is different, there is 1-40% = 60% chance that there was an overlap somewhere. Hope this helps. (Technically, we are assuming independent events but that subtlety is not important for the main point).

abc

hi,
(364/365)^253 means that 253 people have different birthdays

when you check this for 366 people , there is a >=100% chance for the birthday paradox.
but when you use this fomula we get the answer as 1 – 2.6 * 10^-80 which is less than 1

why is it so??

AND I have never seen two people having the same birthday in my group which has a greater strength than 23.this cannot be a coincidence!!!

I still doubt that there is a 50% chance of people having the same birthday

kalid

Hi, when you make the probability like (364/365)^253, you are assuming independent events. What this means is that each comparison is “fresh”, with no memory of the past. It would be like having 2 people pick the same number out of 365, and choosing a different number each time.

This approximation makes the math easier, and is ok for small values. If you want the actual %, take a look at Appendix A.

Yep, the paradox seems strange, doesn’t it? Take a look at this page and run some experiments on your own to see:

/examples/birthday/birthday.html

As you click “run trial”, you will see the actual match percentage for 23 people approach 50%, which is the predicted one. Hope this helps.

Problem 1: Exponents aren’t intuitive

Problem 2: Humans are a tad bit selfish

Ok, fine, humans are awful: Show me the math!

Explanation: Counting Pairs

Interactive Example

Examples and Takeaways

Appendix A: Repeated Multiplication Explanation (Geeky Math Alert!)

Appendix B: The General Birthday Formula

Appendix C: Try it out!

{{ strings.title }}

Other Posts In This Series

Leave a Reply

Calculus Course

Math, Better Explained

	This comment is spam
	This comment is abusive
	Other

Understanding the Birthday Paradox

Problem 1: Exponents aren’t intuitive

Problem 2: Humans are a tad bit selfish

Ok, fine, humans are awful: Show me the math!

Explanation: Counting Pairs

Interactive Example

Examples and Takeaways

Appendix A: Repeated Multiplication Explanation (Geeky Math Alert!)

Appendix B: The General Birthday Formula

Appendix C: Try it out!

{{ strings.title }}

Other Posts In This Series

Leave a Reply

In This Series

About The Site

Calculus Course

Math, Better Explained