Bayes’ theorem was the subject of a detailed article. The essay is good, but over 15,000 words long — here’s the condensed version for Bayesian newcomers like myself:

**Tests are not the event.**We have a cancer*test*, separate from the event of actually having cancer. We have a*test*for spam, separate from the event of actually having a spam message.**Tests are flawed.**Tests detect things that don’t exist (false positive), and miss things that do exist (false negative).**Tests give us test probabilities, not the real probabilities.**People often consider the test results directly, without considering the errors in the tests.**False positives skew results.**Suppose you are searching for something really rare (1 in a million). Even with a good test, it’s likely that a positive result is really a*false positive*on somebody in the 999,999.**People prefer natural numbers.**Saying “100 in 10,000″ rather than “1%” helps people work through the numbers with fewer errors, especially with multiple percentages (“Of those 100, 80 will test positive” rather than “80% of the 1% will test positive”).**Even science is a test**. At a philosophical level, scientific experiments can be considered “potentially flawed tests” and need to be treated accordingly. There is a*test*for a chemical, or a phenomenon, and there is the*event*of the phenomenon itself. Our tests and measuring equipment have some inherent rate of error.

**Bayes’ theorem finds the actual probability of an event from the results of your tests.** For example, you can:

**Correct for measurement errors**. If you know the real probabilities and the chance of a false positive and false negative, you can correct for measurement errors.**Relate the actual probability to the measured test probability.**Bayes’ theorem lets you relate Pr(A|X), the chance that an event A happened given the indicator X, and Pr(X|A), the chance the indicator X happened given that event A occurred. Given mammogram test results and known error rates, you can predict the actual chance of having cancer.

## Anatomy of a Test

The article describes a cancer testing scenario:

- 1% of women have breast cancer (and therefore 99% do not).
- 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
- 9.6% of mammograms detect breast cancer when it’s
**not**there (and therefore 90.4% correctly return a negative result).

Put in a table, the probabilities look like this:

How do we read it?

- 1% of people have cancer
- If you
**already have cancer**, you are in the first column. There’s an 80% chance you will test positive. There’s a 20% chance you will test negative. - If you
**don’t have cancer**, you are in the second column. There’s a 9.6% chance you will test positive, and a 90.4% chance you will test negative.

## How Accurate Is The Test?

Now suppose you get a positive test result. What are the chances you have cancer? 80%? 99%? 1%?

Here’s how I think about it:

- Ok, we got a positive result. It means we’re somewhere in the top row of our table. Let’s not assume anything — it could be a true positive or a false positive.
- The chances of a
*true positive*= chance you have cancer * chance test caught it = 1% * 80% = .008 - The chances of a
*false positive*= chance you don’t have cancer * chance test caught it anyway = 99% * 9.6% = 0.09504

The table looks like this:

And what was the question again? Oh yes: what’s the chance we really have cancer if we get a positive result. The chance of an event is the number of ways it could happen given all possible outcomes:

`Probability = desired event / all possibilities`

The chance of getting a real, positive result is .008. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (.008 + 0.09504 = .10304).

So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.

Interesting — a positive mammogram only means you have a 7.8% chance of cancer, rather than 80% (the supposed accuracy of the test). It might seem strange at first but it makes sense: the test gives a false positive 10% of the time, so there will be a **ton** of false positives in any given population. There will be so many false positives, in fact, that **most** of the positive test results will be wrong.

Let’s test our intuition by drawing a conclusion from simply eyeballing the table. If you take 100 people, only 1 person will have cancer (1%), and they’re nearly guaranteed to test positive (80% chance). Of the 99 remaining people, about 10% will test positive, so we’ll get roughly 10 false positives. Considering all the positive tests, just 1 in 11 is correct, so there’s a 1/11 chance of having cancer given a positive test. The real number is 7.8% (closer to 1/13, computed above), but we found a reasonable estimate without a calculator.

## Bayes’ Theorem

We can turn the process above into an equation, which is Bayes’ Theorem. It lets you take the test results and correct for the “skew” introduced by false positives. You get the real chance of having the event. Here’s the equation:

And here’s the decoder key to read it:

- Pr(A|X) = Chance of having cancer (A) given a positive test (X). This is what we want to know: How likely is it to have cancer with a positive result? In our case it was 7.8%.
- Pr(X|A) = Chance of a positive test (X) given that you had cancer (A). This is the chance of a true positive, 80% in our case.
- Pr(A) = Chance of having cancer (1%).
- Pr(not A) = Chance of not having cancer (99%).
- Pr(X|not A) = Chance of a positive test (X) given that you didn’t have cancer (~A). This is a false positive, 9.6% in our case.

Try it with any number:

It all comes down to the chance of a **true positive result** divided by the **chance of any positive result**. We can simplify the equation to:

Pr(X) is a normalizing constant and helps scale our equation. Without it, we might think that a positive test result gives us an 80% chance of having cancer.

Pr(X) tells us the chance of getting *any* positive result, whether it’s a real positive in the cancer population (1%) or a false positive in the non-cancer population (99%). It’s a bit like a weighted average, and helps us compare against the overall chance of a positive result.

In our case, Pr(X) gets really large because of the potential for false positives. Thank you, normalizing constant, for setting us straight! This is the part many of us may neglect, which makes the result of 7.8% counter-intuitive.

## Intuitive Understanding: Shine The Light

The article mentions an intuitive understanding about shining a light through your real population and getting a test population. The analogy makes sense, but it takes a few thousand words to get there :).

Consider a real population. You do some tests which “shines light” through that real population and creates some test results. If the light is completely accurate, the test probabilities and real probabilities match up. Everyone who tests positive is actually “positive”. Everyone who tests negative is actually “negative”.

But this is the real world. Tests go wrong. Sometimes the people who have cancer don’t show up in the tests, and the other way around.

Bayes’ Theorem lets us look at the skewed test results and correct for errors, recreating the original population and finding the real chance of a true positive result.

## Bayesian Spam Filtering

One clever application of Bayes’ Theorem is in spam filtering. We have

- Event A: The message is spam.
- Test X: The message contains certain words (X)

Plugged into a more readable formula (from Wikipedia):

Bayesian filtering allows us to predict the chance a message is really spam given the “test results” (the presence of certain words). Clearly, words like “viagra” have a higher chance of appearing in spam messages than in normal ones.

Spam filtering based on a blacklist is flawed — it’s too restrictive and false positives are too great. But Bayesian filtering gives us a middle ground — we use *probabilities*. As we analyze the words in a message, we can compute the chance it is spam (rather than making a yes/no decision). If a message has a 99.9% chance of being spam, it probably is. As the filter gets trained with more and more messages, it updates the probabilities that certain words lead to spam messages. Advanced Bayesian filters can examine multiple words in a row, as another data point.

## Further Reading

There’s a lot being said about Bayes:

Have fun!

you have a typo, in …

9.6% of mammograms miss breast cancer when it is there (and therefore 90.4% say it is there when it isn’t).

… you meant to say somthing like :

9.6% of mammograms incorrectly indicate breast cancer when it isn’t there, and the other 90.4% correctly say it is not there when, well, it is not there.

Thanks Gavrilo — I just fixed it.

Hey, here’s an interesting bayes problem i came across first in a book (The curious incident of the dog in the night time).

Suppose you are in a game show. You are given the choice of three doors – one of which conceals a valuable prize and the

others conceal a goat.

After you make a choice, the host opens one of the other doors (–one without a prize).

He then gives you the option of staying with the initial choice of door or switching to the other door. The door finally chosen is then opened.

Should you switch, not switch, or does it make no difference what the contestant does?

_____________________________________________

ANSWER

by Bayes theorum you can see that if you switch u’d have a 2:1 advantage.

Hi Amal, thanks for dropping by. Yes, I like that question too, it was presented to us as “The Monty Hall” problem when studying computer science.

It’s pretty amazing how counter-intuitive the results can be — switching your choice after you’ve picked “shouldn’t” change your chances, right? I plan on writing about this paradox, too

Oddly useful! I’ve been reading Bayes explanations for a while, and this one really hit home for me for some reason.

One thing that you might consider adding (something I’ve never seen) is a pie-chart visualization of what’s going on. Basically, you have a pie of 100% of people. 1% of that pie has cancer, so that’s a tiny slice. The test will produce a positive for 80% of that 1% slice + 9.6% of the remaining 99% slice– you can imagine that as a little blue translucent piece of appropriate size that covers most of the 1% slice and a chunk of the 99% slice. From that mental image, it’s obvious what’s going on– there’s a lot more blue on the 99% than on the 1%. Might be too complicated, but hey. Anyways, thanks.

Hey Ed, thanks for the comment. I agree — some type of chart may make the relationship that much clearer. Appreciate the suggestion, I’ll put one together.

Bayes theorem can also be thought of as

True Positives

——————————–

True Positivees + False Positives

So a large number of false positives reduces the accuracy of the test because the denominator increases.

Thanks Lee! That’s a great way to put it.

About Monty Hall- the Bayes application to this seems very forced. The Monty Hall problem is a simple probability problem, or it can be viewed as a partitioning problem. See:

http://randy.strausses.net/tech/montyhall.htm

Using Bayes for this makes it needlessly complex, not “betterExplained”.

Similarly, the article above is needlessly complex- nuke the first equation and leave the simpler one. You just pulled it out of thin air anyway- it doesn’t help anyone.

The usual diagram, given in HS stats classes, is a rectangle, with A, ~A on the top, B, ~B on the side. Say A is .9 and B is .2. The area of the small quadrant (.02), is the probability of A and B both happening. This area can be also viewed as P(A|B)*P(B) or P(B|A)*P(A). You have to explain why, but it’s pretty evident from the diagram. Then just equate these two and divide by P(B) and you have the simpler equation.

Hi Randy, Bayes may be overkill for the Monty Hall problem, but it’s interesting to see that it can apply there as well.

Yes, the diagram you mention may be a helpful addition to the discussion above, appreciate the feedback.

Just wanna say thank you for writing this. I know about the original article and I tried reading it but somewhere along the way I got lost and couldn’t follow it.

Hi numerodix, you’re welcome — I found the original article interesting but a bit long as well, so I decided to summarize it here.

Hello, I just came upon this site and I’m finding it beautiful. I think I spotted an error in this article, though.

When you say:

“”Of those 100, 80 will test positive” rather than “80% of the 1% will test positive”).”,

you probably wanted to say: “rather than ‘80% of the 100% will test positive'”.

Hi Matteo, thanks for the comment. The statement actually refers to the original 1%, so it’s giving a way of giving compound percentages (80% of 1% vs. 80 out of 10,000).

wow thank you so much for this, you really did a good job explaining it, i have my AP statistics exam today at noon so this might save me

Randy, back on Nov 7 2007, suggested using overlapping rectangles – Venn diagrams – to help clarify the Rev. Bayes. In their book “Chances Are…” (Viking Penguin, 2006), Kaplan & Kaplan did so on pp. 184 ff. Indeed it does help.

you. are. the. best.

oh my god this is the dogs bollocks for my molecular phylogenetics revision!

@Anonymous: Thanks!

@John: Appreciate the reference. Another explanation with a venn diagram: http://blog.oscarbonilla.com/2009/05/visualizing-bayes-theorem/

@Anonymous: Thank you!

@Patty: Glad it helps

This is one of the best explanations I’ve found. Perhaps we can see if I really understand it by trying a real world problem I’m wrestling with.

Here’s the data:

– The odds of a chest pain (CP) being caused by a heart attack is 40%.

– The odds of a CP being caused by other factors (anxiety, depression, etc.) is 60%.

– The odds of a heart attack occurring to a female above age 50 is 80%.

– The odds of a heart attack occurring to a female under age 50 is 20%.

I am presented with a 24 year old female who says she is having chest pain. What is the probability that her chest pain is caused by a heart attack? Is it 0.4 x 0.2 = 0.08?

Also, 78% of patients having heart attacks present with diaphoresis (sweating), so 22% of patients having heart attacks don’t sweat. This female is not sweating, so are the odds of her having a heart attack 0.22 x 0.08 = 0.0176?

Thank you!

Thanks for writing this!! Even my stats prof was making this too difficult for everyone, but you have simplified it for me. I now have an understanding of the Bayes formula (enough to write my midterm this morning ).

thanks! finally got the concept behind bayes rule

@Ayush: Glad it helped!

Could you work out an example of an email with *two* words, say ‘Viagra’ and ‘hello’?

I didnt get that, bayes theorem is still a tilted pot for me, but thanks for helping!

what would happen if we have to consider other prior probability..lets say, the doctor looked at the symptoms of the patient and guessed that he has 60 percent chance of having cancer. Doctor sends him for the test and test showed positive result. How would we incorporate that 60 percent odd of having cancer based on the patient’s symptoms to the Bayesian equation.

hi

thank you for this article. The first time I came across Bayes Theorem in a business statistics book it was not so clear at all. No it makes more sense for me and its pretty clear..

@France: You’re welcome, glad it helped. I understand it better now too, but there’s still more to go before it’s completely intuitive for me :). I’d like to do a follow-up to this focusing on using the probabilities it predicts.

On the cancer example, it’s interesting to see that a negative test is really significant. That is, if the test says you don’t have cancer, then probability of not having cancer is 99.78% ! So, the value of mammogram is that the healthcare $ can be employed in further investigating the positive (+ and -) cases only.

Thank you for taking the time to write this – it has really helped me get my head around the concept!

@Dan Weisberg: 14% of chest pains in women under 50 are therefore caused by heart attacks. Its (0.2*0.4)/((0.2*04)+(0.8*0.6)) = 0.14

@Dan Weisberg: for some odd reason, my posts seem to disappear and reappear… anywho, the other part to your question (as earlier posted) is 4.5% irrespective of when you choose to include the diaphoresis variable.

wow, thank you so much for this, made the concept so much clearer to me

@Bips: Glad it helped!

This post is filled with jargon, and I’m surprised people can tolerate it.

Nicely explained; this really helped me study for my stat final. Thanks!

Thanks so much for this simple breakdown!

@Concerned: You should have seen the original :).

@Pooja, @Alec: Thanks!

“A Bayesian is someone who, vaguely expecting a horse, and glimpsing the tail of a donkey, concludes he has probably seen a mule.” – John Hussman

Not really helpful, but rather funny.

Extremely helpful article. The lecture by the stats prof was utterly bewildering….. thanks!

@MH: Awesome, glad it helped.

Your explanation using the cancer example was superb.

@Joseph: Thanks!

Do you know of “The theory that would not die,” Sharon Bertsch McGrayne’s history of the Bayes controversy? It is history at its best including breaking the Enigma Code during the Second World War, plus lots of other applications. The book was published by Yale University Press in 2011.

Woman’s percent chance of having cancer if tested negative for breast cancer.

P(False Neg)/(P(False Neg) + P(Accurate Negative)) => .01(.2)/(.01(.2) + .99(.904)) =.0022 = .22%

How’s my math?

Very Helpful! I appreciate the intuitive explaination. Thank You!

Your analysis of the Monty Hall problem *assumes* that the host’s intentions and attitudes are irrelevant to his *choice* of whether to reveal the secret of one non-winning selection. If the host wishes you to win, he will only open a door when you chose the wrong one first, but if he wishes you to lose, then he will open a door only when you have chosen the right one first. Thus unless you know that his wishes don’t matter then Bayes’ Theorem will not help. If you were to know his wishes do matter, then your rational choice depends solely on your estimate of whether he wishes you to win (switch) or lose (don’t switch).

@Richard: It might need to be clarified, but the rules are the host will always reveal a goat in one of the two doors you didn’t pick. Even if you’ve already picked the car, he will still reveal a goat.

Thanks for writing that! Really cleared it up…I have also since read the explanation with a small binomial tree, which was pretty helpful, but this really hit home. Thanks for the clear explanation!

@Geoff: Glad it helped! I’m planning on doing a follow-up to Bayes since it’s used so much in spam filtering and other machine learning.

bayes theorem is incredible

Hello Kalid, I am glad I found this helpful site and also your older one at princeton.edu. About your explanation of Bayes’ theorem, however, I have two minor caveats.

You wrote: “Interesting — a positive mammogram only means you have a 7.8% chance of cancer, rather than 80% (the supposed accuracy of the test).”

Shouldn’t we better take 90.4% as the supposed accuracy of the test? The sensitivity of the test is 80%; it tells us that 20% of the negative results are false; but if we’ve got a positive mammogram, that information about negative results doesn’t interest us in the first place. Much more interesting is the specificity of the test: it’s 90,4% and therefore tells us that 9,6% of the positive results will be false. If our test result is positive, we would wish the specificity of the test were still much lower, because then we could have a much better founded hope of being healthy despite the positive result.

You wrote: “It all comes down to the chance of a true positive result divided by the chance of any positive result. […] Pr(X) is a normalizing constant and helps scale our equation. Without it, we might think that a positive test result gives us an 80% chance of having cancer. Pr(X) tells us the chance of getting any positive result, whether it’s a real positive in the cancer population (1%) or a false positive in the non-cancer population (99%). It’s a bit like a weighted average, and helps us compare against the overall chance of a positive result. In our case, Pr(X) gets really large because of the potential for false positives. Thank you, normalizing constant, for setting us straight!“

Instead of 80% I would again prefer to read 90,4% here. But even after this modification I doubt whether your explanation of the “normalizing” could be called sufficient, because in my interpretation it’s not only our using Pr(X) as denominator that helps us “scaling the equation”. We’ve got a positive test result, and that’s why we are inquiring after Pr(A|X). As soon as we take Pr(X) into consideration, we must not compare it with the specificity of the test nor with the unqualified (i.e., unconditional) sensitivity; rather, we have to compare it with the conditional probability of the following case: that someone has breast cancer (1%) and that the test detects it (80%). So not only the denominator Pr(X), but also the factor Pr(A) in the numerator (i.e., the a priori probability of having breast cancer) is a “normalizing” element, isn’t it?

Here are two contradictory statements in your article regarding False positives. I quote:

Pr(X|~A) = Chance of a positive test (X) given that you didn’t have cancer (~A). This is a false positive, 9.6% in our case.

The chances of a false positive = chance you don’t have cancer * chance test caught it anyway = 99% * 9.6% = 0.09504

These two interpretations of false positives are often mixed in various texts.

One is Pr(X|~A) and the other is Pr(X∩~A)

Hello guys;

could you help me to solve this problem as soon as possible?

Transplant operations for hearts have the risk that the body may reject

the organ. A new test has been developed to detect early warning signs

that the body may be rejecting the heart. However, the test is not

perfect. When the test is conducted on someone whose heart will be

rejected, approximately two out of ten tests will be negative (the test is

wrong). When the test is conducted on a person whose heart will not

be rejected, 10% will show a positive test result (another incorrect

result). Doctors know that in about 50% of heart transplants the body

tries to reject the organ.

*Suppose the test was performed on my mother and the test is positive

(indicating early warning signs of rejection). What is the probability that the

body is attempting to reject the heart?

*Suppose the test was performed on my mother and the test is negative

(indicating no signs of rejection). What is the probability that the body is

attempting to reject the heart?

MMMMHHH… BEEN HELPFULL

Wow, thank you so much for this. I looked at several examples and tutorials before arriving here and your use of tables really, really helped wrap my head around it. Thanks!!

Excellent!

Hey guys great work. Thank you

You might want to look at this if you want to understand Bayes’ theorem in less than 2.5 minutes: http://www.youtube.com/watch?v=D8VZqxcu0I0

I think a Venn Pie chart (with overlapping sectors) could really help to make an intuitive explanation:

http://oracleaide.wordpress.com/2012/12/26/a-venn-pie/

a graphic solver for bayes’ theorem… again – using venn pie chart:

http://dl.dropbox.com/u/133074120/venn_pie_solver.html

I can not even explain the feelings of gratitude and affection I now have towards you. Thank you so much.

Hi Ada, really glad it helped!

This is a wonderful site, and I try and visit your site whenever possible…Excellent work you are doing here! Keep up the awesome work!

This is just so bloody good!

What a fantastic bit of insight, and I used it to explain the nature of testing to a room full of people the other day.

Sometime, you’ll stop writing these articles and I (and many, many others) will be stuffed.

Keep it up Kalid

Hi Steve, glad it’s clicking! Love that you were able to help others learn. Appreciate the encouragement, hope to keep going as long as I can ;).

I agree with the way that this article was presented. Sometimes people want to see where the subject is going before they invest the time in understanding the math. I have put together a fun series of videos on YouTube entitled “Bayes’ Theorem for Everyone”. They are non-mathematical and easy to understand. And I explain why Bayes’ Theorem is important in almost every field. Bayes’ sets the limit for how much we can learn from observations, and how confident we should be about our opinions. And it points out the quickest way to the answers, identifying irrationality and sloppy thinking along the way. All this with a mathematical precision and foundation. Please check out the first one:

http://www.youtube.com/watch?v=XR1zovKxilw

Hi. Nicely condensed article. However, unless I’m interpreting your conclusion incorrectly, you stated that ‘a positive test will result in only 1/11 people having cancer (7.8% to be exact)’. My math tells me 1/11 is 9.1%! Since we know 7.8% is correct then the probability is actually 1/13. If we normalise 80 true positives out of 10,000 to be 1 true positive out of 125 then out of the 1030/8 = 12.875 (or 13, not 11) with positive mammograms, 1 will have cancer.

Hi Dave, thanks for the comment. I should have clarified that part — it should be more of an “If you eyeball the probability table (without a calculator), what’s your conclusion?” (My goal for eyeballing is to test our intuition without explicitly computing the formula, just like we might “eyeball” the square root of 20 as 4.5.)

Eyeballing the table, if we have 100 people, about 1 will have cancer and be detected (technically “0.8”), and about 10 will not have cancer and be detected (99 people * 9.6% of false positive). So we have roughly 1 real cancer event, and 10 false ones, for an estimate of 1/11 chance of the event being meaningful. (The actual amount is 7.8% as you said, because the numbers aren’t 1 and 10 exactly).

Really appreciate the comment, that part wasn’t clear and I’ll clean it up.

I like the way you think about math, and your approach — really useful!

Can you play teacher correcting my work for a moment? I took your approach and applied it to 2 other Bayesian examples from other sites — the NYT article that the “big” essay references, and another from techtarget:

http://whatis.techtarget.com/definition/Bayesian-logic

The application against the NYT example worked fine (the possibility of a an unfair coin, given the 3 coin tosses). I assume the author slightly rounded the result for simplicity — revised posterior probability goes from 1 in 3 to 4 in 5 (or 80%), whereas I got 79.758%).

However, in the techtarget example, I don’t see how the author gets the 50% probability. Here’s the salient part of the problem restated:

——————————-

“[S]uppose that we have a covered basket that contains three balls, each of which may be green or red. In a blind test, we reach in and pull out a red ball. We return the ball to the basket and try again, again pulling out a red ball. Once more, we return the ball to the basket and pull a ball out – red again. We form a hypothesis that all the balls are all, in fact, red. Bayes’ Theorem can be used to calculate the probability (p) that all the balls are red (an event labeled as “A”) given (symbolized as “|”) that all the selections have been red (an event labeled as “B”):

p(A|B) = p{A + B}/p{B}

Of all the possible combinations (RRR, RRG, RGG, GGG), the chance that all the balls are red is 1/4; in 1/8 of all possible outcomes, all the balls are red AND all the selections are red. Bayes’ Theorem calculates the probability that all the balls in the basket are red, given that all the selections have been red as .5…”

——————————-

I calculate as follows:

p(A) = 0.25 — i.e., RRR, all red balls in hat

p(~A) = 0.75 — chance hat contains RRG, RGG or GGG

p(B|A) = 1.0 — chance that the 3 selections are red, given the hat contains RRR

p(B|~A) = 0.125 — chance of picking 3 reds, given the hat contains RRG, RGG, or GGG.

p(B|~A) derived as:

– Pick R on first ball = 0.5 (R or G); pick RR after 2 balls = 0.25 (RR, RG, GR or GG); pick RRR after 3 balls = 0.125 (RRR, RRG, RGG, GGG, GRR, GGR, RGR, GRG).

— RRG: 0.67 * 0.125 = 0.08375

— RGG: 0.33 * 0.125 = 0.04125

— GGG: 0

— RRG + RGG + GGG = 0.08375 + 0.04125 + 0 = 0.125

Execute Bayes:

— Numerator: p(B|A) * p(A) = 1 * 0.25 = 0.25

— Denominator: p(B|A) * p(A) + p(B|~A) * p(~A) = (1 * 0.25) + (0.125 * 0.75) = 0.25 + 0.09375 = 0.34375

— Result: 0.25 / 0.34375 = 0.72727

The author gets 0.5. Using an abbreviated form of Bayes in her text, she has it that p(A + B) = 0.125 — “in 1/8 of all possible outcomes, all the balls are red AND all the selections are red. ” I struggle with that statement, but going with it, I take that as meaning all TRUE positives. And, p(B) = the event where all 3 selections are red, whether the hypothesis of the hat containing RRR is true or not. If her dividend is 0.125, in order to obtain 0.5, she would need a divisor of 0.25 (0.125 / 0.25 = 0.5).

This would imply p(B|~A) * p(~A) = 0, which is not possible (i.e., the chance of picking 3 red balls where the possible population does indeed contain the possibility of red balls existing in some proportion — RRG or RGG).

Help!

Thank you! (Really — Thank you!!)

Very well written . Very helpful

@John: Interesting question. Before going further though, I think the techtarget author made some mistakes. If the balls are randomly distributed (between Red and Green) then the 8 possible outcomes are

RRR RRG RGR RGG GRR GRG GGR GGG

Sure, RRG may look the same as RGR and GRR, but that combo (2 reds and 1 green) has a 3/8 chance of happening. Similar for GGR. So the real odds are:

All reds = 1/8

2 reds, 1 green = 3/8

2 greens, 1 red = 3/8

All greens = 1/8

From there, I’d probably have to work out a quick table to see the probabilities. The idea is to figure out how many false positives you get, how many true positives you get, and find the chance of the event as (true positives) / (false positives + true positives).

@Anon: Thanks.

Thank you, makes sense now!

The data you used with the mammogram example, were the probabilities in it true for tests actually in use?

That is an awesome explanation. thankS!

You have a mistake there. 1% * 80% is not = 0.08, BUT it is equal to 0.008

Check it out here on Wolfram Alpha: http://www.wolframalpha.com/input/?i=1%25+*+80%25

Hi Dave, I searched but don’t see a reference to “0.08” in the article — more than happy to correct if there’s a calculation error.

Hi,

In Youtube, I found a good lecture on this theorem.

http://www.youtube.com/user/InsofeVideos/videos?view=46&shelf_id=9&tag_id=UCJ1R2JZ_ecOBxyCiURdmgAQ.3.cpee&sort=dd

Check it out.

Your explanation and example was really helpful!!! You should urge your readers to proactively use the calculator you have provided to explore what happens as “false alarms” go up and “missed detections” go down. One can think of kinds of tests where some threshold number can be varied to tailor the Pr(X|A) ‘s, say, if you don’t mind false alarms as a cost of being sure that missed detections are very low…

Thanks Ron, glad you liked it! Great point, the idea behind Bayes is to account for the false alarm / missed detection tradeoff that a test will have. (False alarm / missed detection is a great way to explain false positive and false negative also.)

Kalid,

Suppose that someone approaches you with a friend who she says has a photographic memory, i.e., eidetic imagery (EI). The reason she comes to you is that you have an infallible test of EI, but it has a .05 rate of false positives: that is, if a person has EI, the test will come up positive 100% of the time, but if the person does not have EI, the test will come up positive 5% of the time. Suppose also that only one person in 10,000 has EI. So you test this person’s friend, and the test is positive for EI. What is the probability that the friend actually has EI?

Hi Todd, for this one, I’m going to let you work it out :).

Suppose you have 10,000 people in an auditorium.

1) How many actually have EI? (call this R, for real)

2) How many test positive for having EI, but do not have this? (call this F, for fake)

The probability of actually having it would be R / (R + F) [the number of real people in the total population who tested positive]

Superb article, very well explained. It’s been a couple of years since I last studies Bayes’ theorem, and this was a very useful refresher.

so .001 for R

and .05 for F

so .001/.001+.05= .0196

is this right?

Excellent article Kalid, reminded me succinctly what Bayes is all about. Are you being hoodwinked into doing people’s homework though!

Great work.

Kalid you’re fantastic, I found this site while reading how they recovered the Air France Plane. I wonder Kalid do you have a bet on sports,I’m 70yrs old this October and like thosands of others we try to get our day in by going into bookmakers shops.You’re insight would help fellow punters as the bookmakers have it all there own way,they have 18 screens showing racing from greyhound tracks,horse racing from around the world also now virtual horse racing,virtual horse racing,football from around the world and worse of all they have gaming machines.Just to end from the 18 screens there is only one that gives you the results.Thanks for your insight.

Kalid, thank you for these well-written article. I am presenting a poster on an medical algorithm that calculated ratios based on Bayes’ Theorem. If I am asked how the ratios were calculated, I’ll be more prepared. Thanks again!

Thanks Kamille, glad it helped!

Thanks Joseph, I appreciate it! Ah, I don’t know much about sports betting, but I do know that odds are a more natural way to express probabilities than a percentage :).

Great explanation! I took a class on Game Theory in college in which we studied Bayes Theorem and it just came up again in a book I was reading. I googled for a quick reminder about the theorem and this was perfect, thanks such cool stuff!

Thanks Lizzy, glad you enjoyed it!

For me this is the clearest explanation of Bayes theorem I’ve read

I have one question, how can the effectiveness of any test ever be known? Surely that requires another test and that test itself may have false negatives and positives. Like some recursive horror.

Finally, the Monty Hall problem makes sense! I couldn’t get past the apparent 50/50 split between the two remaining doors until just now reading Randy’s link (way up there). It was just starting to sink in, and then I came up with a visual that nailed it for me. Imagine the prize is behind door number 1, your selection, for 100 trials. If Monty randomly selects the other two doors to open, after 100 trials he will have selected door number 2 about 50% of the time. If the prize is behind door number 3, however, after 100 trials he will have selected door number 2 100% of the time. Chances are better if Monty selects door number 2 that the prize is behind door number 3. :-)) comprehension is bliss

Kalid,

Very very useful and helpful. “Humanities” people, often both a lessor and greater sort, just seem to think differently. Some of us, some of the time. You are doing the “new math.” For real.

Thank you very much,

Gerald

(5th rate philosopher)

Thank you! This was very confusing in the textbook, but your explanation made perfect sense. You may have saved my sanity!

Great article! I referenced and linked to it in my crossvalidated question -> http://stats.stackexchange.com/q/119952/25734 about Bayes’ theorem and the search for a fisherman lost at sea

@Dan: Great question :). I’m not sure how we truly validate the effectiveness, maybe with a very rigorous / secondary investigation.

@Gerald: Thank you!

@Mike: Cool application! Glad it was helpful :).

Wonderful article–brings back the problem and the people involved from years ago.

I’m grateful to you for this lucid explanation of Bayes. Thank you very much!

things in nature are more probabilistic than deterministic…vow…sir and in the final problem story of sherlock holmes ..game theory is used…its just amazing…pls put forward more articles..thanku

Hi, just letting you know that the link to the Yudkowsky essay is broken! Ah, link rot. Great explanation though, thanks.

Just fixed, thanks! It’s now at http://www.yudkowsky.net/rational/bayes

I think it’d be better to simplify to P(A|B) = P(B|A)*P(A) / P(B) *before* the calculations, but maybe that’s just me. Also, there’s a great diagram on this at:

http://en.wikipedia.org/wiki/Bayes%27_theorem#mediaviewer/File:Bayes_theorem_visualisation.svg

Not sure I quite follow the mammography example. Aren’t we just doing a pirouette between different assumed qualifications:

a) 1%: The chances one has cancer given no qualifications

b) 93.4%: The chances a positive mammogram will correlate with a biopsy confirmation of cancer (inverse of false positive)—that is, with the qualification of a positive test result

c) 7.8%: The chances a given positive mammogram is a true positive out of all positive mammograms—that is, with the qualification of the positive test result while somehow magically impeded from knowing about percentage b

While the distinction between b and is a subtle one, they mean different things. c would only be valid—apropos the odds one has great cancer–if one could somehow could know c in absence of knowing b. In the same way, a is valid before one gets a test result, but once one has received a positive mammogram a obviously no longer applies.

In other contexts, c is a valid stat. In absolute terms there will be more false positives than false negatives since most people don’t have cancer. But far as the patient is concerned it’s b that matters. It’s not really paradox, it’s just that the two figures–93.4% and 7.8%–belong in different categories.

Many thanks! I’m working my way through Sharon Bertsch McGrayne’s “Theory that would not die” just now, and have to remind myself occasionally of the elements of the equation. Your article is much clearer than all of the Youtube explanations I’ve seen. Probably there’s room for application of the Rule to them too — “What’s the probability of understanding a Youtube explanation of Bayes’ rule if…?”

Thanks Gilbert, really glad it clicked! Hah, I like the meta application :). I find a lot of explanations online seem to focus on the mechanics of the formula but not the “why”. What is each term trying to represent? Glad it worked for you.

Very nice and clear explanation indeed. But suggest a slight change in notation to be consistent with that used in probability. You have Pr(~A) = Chance of not having cancer (99%). But in probability the “~” means “distributed as”, as in normal distribution, etc. Why not just use “no cancer” instead of ~A.

Thank you, just fixed! (I was using the programming symbol for complement, ~, I should have checked its meaning in statistics!)

Hi, how would the chances of a true cancer case increase, if the patient got two (three, four,…) positive mammogram results(the tests are independent)?