**23 people**. In a room of just 23 people there’s a 50-50 chance of two people having the same birthday. In a room of 75 there’s a 99.9% chance of two people matching.

Put down the calculator and pitchfork, I don’t speak heresy. The birthday paradox is strange, counter-intuitive, and **completely true**. It’s only a “paradox” because our brains can’t handle the compounding power of exponents. We expect probabilities to be linear and only consider the scenarios we’re involved in (both faulty assumptions, by the way).

Let’s see why the paradox happens and how it works.

## Problem 1: Exponents aren’t intuitive

We’ve taught ourselves mathematics and statistics, but let’s not kid ourselves: it’s not natural.

Here’s an example: What’s the chance of getting 10 heads in a row when flipping coins? The untrained brain might think like this:

“Well, getting one head is a 50% chance. Getting two heads is twice as hard, so a 25% chance. Getting **ten** heads is probably 10 times harder… so about 50%/10 or a 5% chance.”

And there we sit, smug as a bug on a rug. No dice bub.

**After pounding your head with statistics**, you know not to divide, but use **exponents**. The chance of 10 heads is not .5/10 but .5^{10}, or about .001.

But even after training, we get caught again. At 5% interest we’ll double our money in 14 years, rather than the “expected” 20. Did you naturally infer the Rule of 72 when learning about interest rates? Probably not. Understanding compound exponential growth with our linear brains is hard.

## Problem 2: Humans are a tad bit selfish

Take a look at the news. Notice how much of the negative news is the result of acting without considering others. I’m an optimist and *do* have hope for mankind, but that’s a separate discussion :).

In a room of 23, do you think of the 22 comparisons where **your** birthday is being compared against someone else’s? Probably.

Do you think of the **231** comparisons where someone who is not you is being checked against someone else who is not you? Do you realize there are so many? Probably not.

The fact that we neglect the **10 times as many** comparisons that don’t include us helps us see why the “paradox” can happen.

## Ok, fine, humans are awful: Show me the math!

The question: What are the chances that two people share a birthday in a group of 23?

Sure, we could list the pairs and count all the ways they could match. But that’s hard: there could be 1, 2, 3 or even 23 matches!

It’s like asking “What’s the chance of getting one or more heads in 23 coin flips?” There are so many possibilities: heads on the first throw, or the 3rd, or the last, or the 1st and 3rd, the 2nd and 21st, and so on.

How do we solve the coin problem? Flip it around (Get it? Get it?). Rather than counting every way to get heads, **find the chance of getting all tails, our “problem scenario”**.

If there’s a 1% chance of getting all tails (more like .5^23 but work with me here), there’s a 99% chance of having **at least one head**. I don’t know if it’s 1 head, or 2, or 15 or 23: we got heads, and that’s what matters. If we subtract the chance of a problem scenario from 1 we are left with the probability of a good scenario.

The same principle applies for birthdays. Instead of finding all the ways we match, **find the chance that everyone is different, the “problem scenario”**. We then take the opposite probability and get the chance of a match. It may be 1 match, or 2, or 20, but somebody matched, which is what we need to find.

## Explanation: Counting Pairs

With 23 people we have 253 pairs:

(Brush up on combinations and permutations if you like).

The chance of 2 people having different birthdays is:

Makes sense, right? When comparing one person's birthday to another, in 364 out of 365 scenarios they won't match. Fine.

But making **253 comparisons** and having them *all* be different is like getting heads 253 times in a row -- you had to dodge "tails" each time (let’s assume birthdays are independent). We use exponents to find the probability:

Our chance of getting a single miss is pretty high (99.7260%), but when you take that chance hundreds of times, the odds of keeping up that streak drop. Fast.

The chance we find a match is: 1 – 49.95% = 50.05%, or just over half! If you want to find the probability of a match for any number of people n the formula is:

## Interactive Example

I didn’t believe we needed only 23 people. The math works out, but is it real?

You bet. Try the example below: Pick a number of items (365), a number of people (23) and run a few trials. You’ll see the theoretical match and your actual match as you run your trials. Go ahead, click the button (or see the full page).

As you run more and more trials (keep clicking!) the actual probability should approach the theoretical one.

## Examples and Takeaways

Here are a few lessons from the birthday paradox:

**sqrt(n)**is roughly the number you need to have a 50% chance of a match with n items. sqrt(365) is about 20. This comes into play in cryptography for the birthday attack.- Even though there are 2
^{128}(1e38) GUIDs, we only have 2^{64}(1e19) to use up before a 50% chance of collision. And 50% is really, really high. - You only need 13 people picking letters of the alphabet to have 95% chance of a match. Try it above (people = 13, items = 26).
- Exponential growth rapidly decreases the chance of picking unique items (aka it increases the chances of a match). Remember: exponents are non-intuitive and humans are selfish!

After thinking about it a lot, the birthday paradox finally clicks with me. But I still check out the interactive example just to make sure.

## Appendix A: Repeated Multiplication Explanation (Geeky Math Alert!)

Remember how we assumed birthdays are independent? Well, they aren’t.

If Person 1 and Person 3 match, and Person 3 and 5 match, we know that 1 and 5 match also. The outcome of 1 and 5 depends on their results with 3, which means the results aren’t an independent 1/365 chance (in our case, it’s a 100% chance of a match).

When counting pairs we did math as if birthdays were like independent coin flips, and multiplied probabilities. This assumption isn’t strictly true but it’s “good enough” for a small number of people (23) compared to the sample size (365). It’s unlikely to have multiple people match and screw up the independence, so it’s a good approximation.

It’s unlikely, but it can happen. Let’s figure out the real chances of each person picking a different number:

- The first person has a 100% chance of a unique number (of course)
- The second has a (1 – 1/365) chance (all but 1 number from the 365)
- The third has a (1 – 2/365) chance (all but 2 numbers)
- The 23rd has a (1 – 22/365) (all but 22 numbers)

The multiplication looks pretty ugly:

But there’s a shortcut we can take. When x is close to 0, a coarse first-order Taylor approximation for e^{x} is:

so

Using our handy shortcut we can rewrite the big equation to:

But we remember that adding the numbers 1 to n = n(n + 1)/2. Don’t confuse this with n(n-1)/2, which is C(n,2) or the number of pairs of n items. They look almost the same!

Adding 1 to 22 is (22 * 23)/2 so we get:

Phew. This approximation is very close, plug in your own numbers below:

Good enough for government work, as they say. If you simplify the formula a bit and swap in *n* for 23 you get:

and

## Appendix B: The General Birthday Formula

Let’s generalize the formula to picking *n* people from *T* total items (instead of 365):

If we choose a probability (like 50% chance of a match) and solve for *n*:

Voila! If you take sqrt(T) items (17% more if you want to be picky) then you have about a 50-50 chance of getting a match. If you plug in other numbers you can solve for other probabilities:

Remember that m is the *desired chance of a match* (it’s easy to get confused, I did it myself). If you want a 90% chance of matching birthdays, plug m=90% and T=365 into the equation and see that you need 41 people.

Wikipedia has even more details to satisfy your inner nerd. Go forth and enjoy.

## Leave a Reply

226 Comments on "Understanding the Birthday Paradox"

The math here is actually wrong. The chances of individual pairs are not independent. You math would work if you take each pair and have them name a random number between 1 and 365.

With this math, taking a group of 365 people still results in a non-zero chance that they all have different birthdays.

+1 exactly what I thought when I was reading this. And another +1 for the pidgeonhole counterexample ;)

(however, there’s an appendix talking about that)

If there is a group of 365 people there would be a non-zero chance they have different birthdays. They could each have a birthday on each different day of the year. You would need 366 people (ignoring feb29) for it to be guaranteed that there’s a pair with the same birthday

Heyy ;; i have no clue how to do this!

I think that the math behind this birthday paradox is wrong..

The chance of two people having same birthdays is 1/365 = 0.0027397

therefore p(n)= 0.0027397 ^C(n,2)

if we take an example of 23 people

we get p(23)= 0.0027397 ^ 253 ~=0

so how is it possible??

Hi, when you make the probability like (364/365)^253, you are assuming independent events. What this means is that each comparison is “fresh”, with no memory of the past. It would be like having 2 people pick the same number out of 365, and choosing a different number each time.

This approximation makes the math easier, and is ok for small values. If you want the actual %, take a look at Appendix A.

Yep, the paradox seems strange, doesn’t it? Take a look at this page and run some experiments on your own to see:

http://betterexplained.com/examples/birthday/birthday.html

As you click “run trial”, you will see the actual match percentage for 23 people approach 50%, which is the predicted one. Hope this helps.

Thanks Khalid. So is there a way to solve solve this without using the ‘negative’.. that is not by calculating the probability of someone else in the group not having the same bday? Do it directly instead?

@sonny: Great question — I don’t think my probability knowledge is strong enough :). The issue is you need to enumerate every possible type of collision: 1 with 3, 1 and 2 with 3, 1 and 3 and 14… all of which are “problem scenarios”. It’s a bit like writing a spellcheck where you keep track of the possible typos vs. having the correct word and seeing if what you wrote is different from that :).

can the math in the birthday paradox applicable to pick3 lottery?

Hi Allan, I’m not too familiar with the rules of Pick3, but I’ll take a shot.

The birthday paradox helps find the chance that any two random numbers will “collide” in a set.

In Pick3, you don’t really care if two guesses collide… you want the guess to collide with the winning number. In this case, two losing tickets that both guessed 123 (when the real answer was 999) isn’t helpful.

I may be missing something though!

Something doesn’t add up here. The first calculator shows that the birthday example with 365 persons would result in a 100% match, meaning at least 2 persons should have the same birthday. But it’s possible that all 365 persons have different birthdays (the first person born on January 1, the second on January 2 and the last on December 31).

If you ever find that, you owe them all a steak dinner.

Sir, I read it before the heading “Interactive Example” and hats off to you. You have explained it so nicely, that it can actually feel what is happening in this “paradox”!

Thanks a lot sir!

The probability of at least two people sharing a birthday in a group of 22 people is about 50.72972343239854072, not 50.05%. Kalid’s method faultily assumes that the probability any pair sharing a birthday is independent of the probability of another pair sharing a birthday, which is not the case because the pairs contain some of the same people. The exact probability of at least 2 people sharing a birthday out of a group of x people can be calculated by the formula 365Px/365^x, or 365!/((365-x)! 365^x).

You really want to blow someone’s mind?

with a true random selection of 230 people, merely ten times the birthday paradox, there’s almost a 50% chance of not only having two people with the same birthday – but two people with the same birthDATE. (with a 100 year pool.)

Mathematically, it says that number is 191.11… (365.2425*100 = 36524.25 sqrt(36524.25) = 191.113186…)

However realistically, it doesn’t close in on 50% until you get above 220s

[365.2425 is the actual days per year to take leap years into consideration.]

And if the people are randomly selected from a certain pool – such as a college population – the chances increase greatly, obviously…

Isn’t math fun?

How come the last 4 digits in my ssn is the year I was born and the day I was born on??

Oh great. My brain just splattered. Thanks.

Thanks for the info, you’re right. I did some more digging (good paper here) and birthdays aren’t mutually independent.

If Person 1 = Person 3, and Person 3 = Person 5, there isn’t an independent event that Person 1 = Person 5. The probability of 1 matching 5 has already been determined by the other statements.

From what I was able to gather, this is only a problem if there are existing overlapping pairs. For a small n relative to the number of outcomes (365), it’s unlikely to have multiple matches that affect the probability, so assuming independence may be ok for computing approximations.

[…] Understanding the Birthday Paradox | BetterExplained: Understanding the Birthday Paradox […]

[…] Regarding your birthday, whether you are savvy with Hamming’s error correcting code or not, listen to Kalid Azad when he presents Understanding the Birthday Paradox posted at BetterExplained in which he explains the Birthday Paradox from statistics. […]

[…] Understanding the Birthday ParadoxUnderstanding the Pareto Principle (The 80/20 Rule)Law of Unintended ConsequencesSpeed Up Your Javascript, Part 2: Downloadable Examples!Number Systems and Bases Like it? Share on: […]

The last formula is incorrect, it should be:

n ~ sqt(-2 ln(1-p)) sqt(T)

^^^

or else you are finding the probability to miss.

Thanks for the tip! I fixed up the article to use p(different) and p(match), which is much more clear.

The “take-away lesson” about GUIDs is wrong. GUIDs are (theoretically) guaranteed to be globally unique, because they include such things as the MAC address of your network card (something which is globally unique until some cheap NIC manufacturer starts recycling them) and the current time.

The catch is that because of the time factor, the current GUID algorithm won’t last forever. We will run out in a couple of centuries.

Hi, that’s a good point about MAC addresses. However, if you consider GUIDs as just a giant random number (for the purposes of the exercise), you are looking for how many “items” out of a pool of 2^128 you can distribute before having a 50% chance of collision.

For the birthday paradox, it’s about 23 items (of a pool of 365) before a 50% chance of collision. For GUIDs, it will be roughly 2^64 items before a 50% chance of collision.

There’s a bit more information here:

http://en.wikipedia.org/wiki/UUID

Hope this helps,

-Kalid

Hey, great blog.

“” A coarse first-order Taylor approximation for e^x is: \displaystyle{e^x \approx 1 + x}”

that’s just valid if x

[…] if x

@nt: Thanks for the tip, I updated the article to make that more clear.

@Ashton: Hi Ashton, you might want to ask your math teacher to see if you’ve covered the necessary topics in class. You’ll probably need statistics and combinatorics.

Hi Zhao, thanks for the comment! I’ll try to keep cranking out the posts :).

[…] After reading another math explanation on why that’s true, I know that I understand it now. Sure, I might not be able to repeat (or fully understand) the math equations which generate the percentage, but I can identify the bottom line of understanding — when written in POE (plain ol’ English): […]

Hi, you’re correct 1/365 is the chance of 2 people having the same birthday. However, (1/365)^253 would be the chance of 253 people having the *same* birthday! (Which, as you see, is pretty close to zero).

For this problem, it’s important not to mix up 1/365 (the chance of 1 collision) and 364/365 (the chance of no collision). We first find the chance that somehow, everyone manages to be different:

p(23 people have different birthdays) = (364/365)^253

If there is a 40% chance that everyone is different, there is 1-40% = 60% chance that there was an overlap somewhere. Hope this helps. (Technically, we are assuming independent events but that subtlety is not important for the main point).