# Understanding Bayes Theorem With Ratios

Get the Math, Better Explained eBook and turn Huh? to Aha!

My first intuition about Bayes Theorem was “take evidence and account for false positives”. Does a lab result mean you’re sick? Well, how rare is the disease, and how often do healthy people test positive? Misleading signals must be considered.

This helped me muddle through practice problems, but I couldn’t think with Bayes. The big obstacles:

Percentages are hard to reason with. Odds compare the relative frequency of scenarios (A:B) while percentages use a part-to-whole “global scenario” [A/(A+B)]. A coin has equal odds (1:1) or a 50% chance of heads. Great. What happens when heads are 18x more likely? Well, the odds are 18:1, can you rattle off the decimal percentage? (I’ll wait…) Odds require less computation, so let’s start with them.

Equations miss the big picture. Here’s Bayes Theorem, as typically presented:

$\displaystyle{\displaystyle{\Pr(\mathrm{A}|\mathrm{X}) = \frac{\Pr(\mathrm{X}|\mathrm{A})\Pr(\mathrm{A})}{\Pr(\mathrm{X|A})\Pr(A)+ \Pr(\mathrm{X|\sim A})\Pr(\sim A)}}}$

original odds * evidence adjustment = new odds


Bayes is about starting with a guess (1:3 odds for rain:sunshine), taking evidence (it’s July in the Sahara, sunshine 1000x more likely), and updating your guess (1:3000 chance of rain:sunshine). The “evidence adjustment” is how much better, or worse, we feel about our odds now that we have extra information (if it was December in Seattle, you might say rain was 1000x as likely).

## Caveman Statistician Og

Og just finished his CaveD program, and runs statistical research for his tribe:

• He saw 50 deer and 5 bears overall (50:5 odds)
• At night, he saw 10 deer and 4 bears (10:4 odds)

What can he deduce? Well,

original odds * evidence adjustment = new odds


or

evidence adjustment = new odds / original odds


At night, he realizes deer are 1/4 as likely as they were previously:

10:4 / 50:5 = 2.5 / 10 = 1/4


(Put another way, bears are 4x as likely at night)

Let’s cover ratios a bit. A:B describes how much A we get for every B (imagine miles per gallon as the ratio miles:gallon). Compare values with division: going from 25:1 to 50:1 means you doubled your efficiency (50/25 = 2). Similarly, we just discovered how our “deers per bear” amount changed.

Og happily continues his research:

• By the river, bears are 20x more likely (he saw 2 deer and 4 bears, so 2:4 / 50:5 = 1:20)
• In winter, deer are 3x as likely (30 deer and 1 bear, 30:1 / 50:5 = 3:1)

He takes a scenario, compares it to the baseline, and computes the evidence adjustment.

Caveman Clarence subscribes to Og’s journal, and wants to apply the findings to his forest (where deer:bears are 25:1). Suppose Clarence hears an animal approaching:

• His general estimate is 25:1 odds of deer:bear
• It’s at night, with bears 4x as likely => 25:4
• It’s by the river, with bears 20x as likely => 25:80
• It’s in the winter, with deer 3x more likely => 75:80

Clarence guesses “bear” with near-even odds (75:80) and tiptoes out of there.

That’s Bayes. In fancy language:

• Collect evidence, and determine how much it changes the odds
• Compute the posterior probability, the odds after updating

## Bayesian Spam Filter

Let’s build a spam filter based on Og’s Bayesian Bear Detector.

First, grab a collection of regular and spam email. Record how often a word appears in each:

             spam      normal
hello          3         3
darling        1         5
viagra         3         0
...


(“hello” appears equally, but “buy” skews toward spam)

We compute odds just like before. Let’s assume incoming email has 9:1 chance of spam, and we see “hello darling”:

• A generic message has 9:1 odds of spam:regular
• Adjust for “hello” => keep the 9:1 odds (“hello” is equally-likely in both sets)
• Adjust for “darling” => 9:5 odds (“darling” appears 5x as often in normal emails)
• Final chances => 9:5 odds of spam

We’re learning towards spam (9:5 odds). However, it’s less spammy than our starting odds (9:1), so we let it through.

Now consider a message like “buy viagra”:

• Prior belief: 9:1 chance of spam
• Adjust for (“viagra”): …uh oh!

“Viagra” never appeared in a normal message. Is it a guarantee of spam?

Probably not: we should intelligently adjust for new evidence. Let’s assume there’s a regular email, somewhere, with that word, and make the “viagra” odds 3:1. Our chances become 27:2 * 3:1 = 81:2.

Now we’re geting somewhere! Our initial 9:1 guess shifts to 81:2. Now is it spam?

Well, how horrible is a false positive?

81:2 odds imply for every 81 spam messages like this, we’ll incorrectly block 2 normal emails. That ratio might be too painful. With more evidence (more words or other characteristics), we might wait for 1000:1 odds before calling a message spam.

## Exploring Bayes Theorem

We can check our intuition by seeing if we naturally ask leading questions:

• Is evidence truly independent? Are there links between animal behavior at night and in the winter, or words that appear together? Sure. We “naively” assume evidence is independent (and yet, in our bumbling, create effective filters anyway).

• How much evidence is enough? Is seeing 2 deer & 1 bear the same 2:1 evidence adjustment as 200 deer and 100 bears?

• How accurate were the starting odds in the first place? Prior beliefs change everything. (“A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.”)

• Do absolute probabilities matter? We usually need the most-likely theory (“Deer or bear?”), not the global chance of this scenario (“What’s the probability of deers at night in the winter by the river vs. bears at night in the winter by the river?”). Many Bayesian calculations ignore the global probabilities, which cancel when dividing, and essentially use an odds-centric approach.

• Can our filter be tricked? A spam message might add chunks of normal text to appear innocuous and “poison” the filter. You’ve probably seen this yourself.

• What evidence should we use? Let the data speak. Email might have dozens of characteristics (time of day, message headers, country of origin, HTML tags…). Give every characteristic a likelihood factor and let Bayes sort ‘em out.

## Thinking With Ratios and Percentages

The ratio and percentage approaches ask slightly different questions:

Ratios: Given the odds of each outcome, how does evidence adjust them?

The evidence adjustment just skews the initial odds, piece-by-piece.

Percentages: What is the chance of an outcome after supporting evidence is found?

In the percentage case,

• “% Bears” is the overall chance of a bear appearing anywhere
• “% Bears Going to River” is how likely a bear is to trigger the “river” data point
• “% Bear at River” is the combined chance of having a bear, and it going to the river. In stats terms, P(event and evidence) = P(event) * P(event implies evidence) = P(event) * P(evidence|event). I see conditional probabilities as “Chances that X implies Y” not the twisted “Chances of Y, given X happened”.

Let’s redo the original cancer example:

• 1% of the population has cancer
• 9.6% of healthy people test positive, 80% of people with cancer do

If you see a positive result, what’s the chance of cancer?

Ratio Approach:

• Cancer:Healthy ratio is 1:99
• Evidence adjustment: 80/100 : 9.6/100 = 80:9.6 (80% of sick people are “at the river”, and 9.6% of healthy people are).
• Final odds: 1:99 * 80:9.6 = 80:950.4 (roughly 1:12 odds of cancer, ~7.7% chance)

The intuition: the initial 1:99 odds are pretty skewed. Even with a 8.3x (80:9.6) boost from a positive test result, cancer remains unlikely.

Percentage Approach:

• Cancer chance is 1%
• Chance of true positive = 1% * 80% = .008
• Chance of false positive = 99% * 9.6% = .09504
• Chance of having cancer = .008 / (.008 + .09504) = 7.7%

When written with percentages, we start from absolute chances. There’s a global 0.8% chance of finding a sick patient with a positive result, and a global 9.504% chance of a healthy patient with a positive result. We then compute the chance these global percentages indicate something useful.

Let the approaches be complements: percentages for a bird’s-eye view, and ratios for seeing how individual odds are adjusted. We’ll save the myriad other interpretations for another day.

Happy math.

## Other Posts In This Series

Kalid Azad loves sharing Aha! moments. BetterExplained is dedicated to learning with intuition, not memorization, and is honored to serve 250k readers monthly.

Math, Better Explained is a highly-regarded Amazon bestseller. This 12-part book explains math essentials in a friendly, intuitive manner.

"If 6 stars were an option I'd give 6 stars." -- read more reviews

1. Ralph Schneider says:

I have long had issues with “probabilities”, insofar as they are meaningless, yet people throughout society base their judgements on them – often extremely damagingly. Let me explain.

Let’s say a lottery has one million tickets. Each ticket has a one-in-a-million chance of winning, right? WRONG! ONE ticket (the winner) has 100% chance and the rest have 0%! How can all the tickets have the same chance if only one will win? That is illogical and absurd.

“Aha!” you may say, “But they each have an equal chance of winning BEFORE the winner is drawn”. But how can they each have an equal chance of winning if only one wins and the others lose – whether before or after the draw? They cannot and do not have an equal chance! Even young kids can understand that!

While one ticket in a million will win, that does NOT translate into each ticket having an equal chance!

Although most sporting events have favourites to win, quantifying their chances into odds is ridiculous, and nowhere more so than in horse racing. In the same race, one horse may be 20:1 to win, another horse 3:2, another 2:1, another 7:5 etc. – how absurd and meaningless is that, given that only one will win and the others won’t?!

Statements like “Only three other companies are bidding for the contract, so we have a one in four chance of winning it”, or “You have a 60% chance of doubling your money” are commonplace – and meaningless. Even worse, they can and do give false confidence leading to ruin, since people are misled into basing their judgements on them.

And doctors and health lobbyists should be prohibited from making such outrageous statements as “You have a one in five chance of surviving” based on data that one in five people survives. You will either survive (100%) or not (0%), so you do not have 20% chance, but either 100% or 0%. The problem is that some people literally worry themselves to death when they hear such pronouncements from doctors (who should know better, and may just be covering themselves in advance).

The error consists of extending the general to the particular (the opposite of generalising). Just because nine out of ten in my community have white skin does not mean that my skin or anyone else’s has a 90% chance of being white – it is either white or it is not!

2. Rich says:

To @ralph,

I think you concerns need to be understood in relationship to the ‘alternative model’. In this case, the alternative model is unaided intuition (expert guess) when dealing with highly complex and uncertain outcomes. All we are saying here, is that bayes can be used to model our uncertainty (put bounds on our uncertainty) not our exactness. And, there have been hundreds of studies in the pharmaceutical, soil science, military, energy and other industries that demonstrate that quantitative methods (particularly Bayesian methods) significantly out perform other methods of measurement….particularly expert opinion.

Great article in simplifying a complex subject.

3. As always, great post!

The first is that thinking in terms of odds reinforces the Bayesian style of probability that is the degree to which you’d feel comfortable betting on something given everything that you have observed about it so far.

The second is that using odds like you described makes a software implementation easier: you just keep track of things using simple counters. To avoid rounding errors, you can use logarithms. For example, in your second spam scenario, the logarithm of the final ratio would be

(log(9) + log(3) + log(3)) – (log(1) + log(2) + log(1)) = log(81/2)

The point is that you just keep summing logarithms on both sides of the “:” sign of the odds and then have some cutoff that determines spamminess that accounts for the pain of a false positive (i.e. make it 40:1) and then take the logarithm of that. If it exceeds the threshold then classify as spam.

4. Ralph Schneider says:

Thank you, Kalid, for your fine article, and Rich for your welcome comments and Jeff for your interesting additions, on this refreshingly civil and intelligence-based site. I hasten to say that I was in no way criticizing Kalid’s excellent article in my earlier post. I was criticizing the blunt ways that “odds” and “probabilities” are used every day, often misleading people into sometimes tragic outcomes (particularly in finance/gambling and medicine). I would like to see the term “probability” replaced with “possibility” – although that may seem pedantic and trivial, I think it would more accurately reflect the truth and improve understanding. And I look forward as always to further high-quality articles and posts. Best wishes to all.

5. kalid says:

@Ralph, @Rich: Thanks for the discussion! You bring up an interesting point about the meaning of a probability — there are several interpretations, with different philosophical implications.

One interpretation is “If we repeat the experiment many times, we expect to get this outcome some known fraction of the time (i.e., coin flips)” and another is “Probability reflects our level out certainty about our knowledge” (as Rich notes).

So in the coin-flip case, we can say “I’m 50% certain that it will be heads”, which, in practical terms, means I’d be indifferent to winning 2x my money when betting on heads (better odds, like paying 3x, and I’ll play all day. Worse odds, like paying 1.5x, means I’ll never play).

There’s even more philosophical implications around Bayesian vs. Frequentist probabilities that I’m not well-versed in, but want to understand further.

@Jeff: Glad you enjoyed it — awesome insights. Yes, I think ratios/odds keep is in “probability mode” more than raw percentages (which puts my brain in “calculating” mode).

And cool note about the programming. That’s exactly it: computers have limited precision in the numbers they can represent, so drastically-shrinking percentages are a no go [in the spam case, especially, where we'll have a multiplication by a small fraction for each word in the message!]. We can use tricks like taking the logarithm of large multiplications (i.e., adding logs) to keep numbers within reasonable bounds.

6. >“Aha!” you may say, “But they each have an equal chance of winning BEFORE the winner is drawn”.

Well, *I* would never say that. “Aha!” I would say instead, “But they each have an equal chance of winning BEFORE I learn the identity of the winner”. This makes it clear that lottery probabilities are facts about my knowledge about the lottery, not facts directly about the lottery itself.

7. My head is still not completely wrapped around Bayes Theorm yet even thought it loomed large in both the inaugural MOOC AI course taught by Thrun and Norvig and Daniel Ng’s machine learning course. As usual I can feel the wrap wrap wrapping at my brain’s door with your latest post.

8. AJ says:

@Ralph
What if we rephrased the “all tickets have equal chance to win” sentence as
“all tickets have equal chance to be the winning ticket” ?

Just wondering, if the apparent lack of meaning of probability you mentioned, is a consequence of skipping nuances while making statements.

I could be completely wrong here. I’m only an ‘amateur enthusiast’ at the maximum, regarding statistics and probability.

Thank you Kalid, for the great article.

9. Ralph Schneider says:

@ AJ
Thanks for the suggestion, but to me “all tickets have (an) equal chance to be the winning ticket” and “all tickets have (an) equal chance to win” have identical meanings (and in the lottery scenario are therefore, I maintain, equally false statements).

10. Ralph Schneider says:

And I now refute what I suggested earlier: ‘I would like to see the term “probability” replaced with “possibility” – although that may seem pedantic and trivial, I think it would more accurately reflect the truth and improve understanding.’ This would be equally inaccurate, since (in our lottery) only one ticket can possibly win, while the rest have zero possibility.

11. Steven Yuan says:

@Ralph
You bring up a good point. How can multiple outcomes have equal chances when only a select few will ever occur? I’m no mathematician, but there are a couple things I’d like to bring up that may prove relevant. Firstly, statistics is rather misleading in that it does not really deal with probabilities, but rather averages, and the ability to make an educated guess. Say I obtain a raffle ticket from a pool of ten tickets. What is my chance of winning the raffle? In a sense, my chance is obviously either 100% or 0% – I can’t “maybe” win the raffle. I either win or I don’t. So, in this sense, there is not much probability involved. Nine tickets have a 0% chance of winning, and one has a 100% chance of winning. Before the ticket is drawn, we can estimate the “chance” of a certain ticket winning, but the end result is the same, and so, in a roundabout sort of way, the tickets already “knew” whether they were going to win or not. But that’s not necessarily what probability is about. I know that one of the ten tickets is going to win. What probability allows me to do is decide whether getting one of these tickets is worth it. Let’s say that I repeat this process over and over; ie. I buy one raffle ticket out of ten total tickets, every time the drawing takes place. In the first drawing, one of the ten tickets wins. Fine. That may have been mine – we don’t know, as this is purely hypothetical. In the second drawing, one of the ten tickets wins (surprise!). If there are ten drawings, then I will have bought ten tickets, and, according to probability, I will have won about one raffle. Will I have actually won exactly one raffle? Probably not. But if there are one hundred drawings, then I will have bought one hundred tickets, and I will have won about ten raffles. The amount of raffles I have won is rarely ever coincident with the projected amount, but as the amount of drawings increases, the projection grows more accurate. This is what probability does: it gives us averages. Now, let’s say that the prize for the raffle is a ten dollar chocolate cake. Each ticket costs three dollars. We can use probability to figure out whether it is statistically advantageous for me to enter the drawing. If the drawing takes place one hundred times, I will have won about ten of them, giving me about $100 worth of chocolate cakes (mmm, delicious!). However, that will have meant that I spent$300 on raffle tickets. Obviously, it was not in my favor to enter the raffle – unless, of course, I won more than 30 chocolate cakes. It’s certainly possible that I did win a ridiculous amount of cakes; however, basic probability tells me that I would be better off spending my money on something more reliable. So, wouldn’t the chance of me getting back my money’s worth be 100% (if I won at least 30 cakes) or 0% (if I didn’t)? Yes, in the long run, it would. But since I didn’t have any way of seeing how many cakes I would win, use of statistics was my best bet. It’s not perfect, but it’s necessary.

12. Ian Hatch says:

Great article, thanks Khalid! Bayes can be really useful, and isn’t easy to think about.

I’d like to weigh in on the lottery thing:

Wile I certainly can’t disagree that probabilities are misused to justify bad choices, I think Ralph’s resentment of probabilities themselves is a little off-base. We’re talking about predictors here – given what we know, what are the outcomes that we can predict? Sometimes, as with Bayes and patient survival, we’re just going on prior observation, and sometimes we can calculate the exact array of possibilites, as with a lottery number. These predictors here are then used in assessment of risk, and as noted in the article, the cost of failure has to be a factor when making a decision.

I think that in society the problem usually comes not from a reliance on the odds, but from trivializing the cost of failure. Risks are taken because the odds are good, but if failure will cause an unacceptable loss (like the loss of a business, for example) then playing it safe is usually a sounder strategy.

If we’re talking about the semantics, then I think “probability” is actually a fantastic name. The possibilities in the case of a fatal disease are “I will survive this disease” and “I will die from this disease”. You can’t really ask “how possible is it that I will die?”, but you can ask “how probable (likely) is it that I will die?” And that information is valuable – if you’re likely to die, then it’s a good idea to get your affairs in order. But the really valuable information is “can I affect my chance of living?”, not “will I die?”

And a nitpick: in most lotteries, every ticket does have an equal chance to win, so long as everyone picks the same numbers. It’s perfectly possible to have more than one winning ticket – it’s the combination of numbers that can win or lose.

13. Ralph Schneider says:

Ian, regarding your ‘nitpick’: you have cited a specialised lottery where you pick your numbers, and say that IF several tickets have the WINNING numbers, they have an equal chance of winning (which is obviously 100%). Although I was not referring to such lotteries which may have several winners, my argument is unchanged, and I have read nothing so far which refutes it, so will not use time and space to re-state it.

But I will point out that the recent Global Financial Crisis resulted from reliance on on “probabilities”!

14. David says:

On the meaning of “probability”. Is it helpful to think of a probability as meaning “my best estimate of the likelihood of something happening, based on the information available”?

So sure, only one ticket can win a lottery, but in the absence of knowing which one it is, when I buy one lottery ticket out of the million offered, MY best estimate is that I have a 1 in a million chance of that being the winning ticket.

15. @David: Yes, this is a much more fruitful way of looking at probability than the one proposed by Ralph Schneider. Probability is a two-place function of a fact and an observer, not a one-place function of the fact alone. That is, you don’t say that the probability that the ticket will win is one in a million; you say that the probability that the ticket will win, *given your information*, is one in a million.

16. Ralph Schneider says:

Let me be clear: that one ticket in a million will win is true, but that each ticket has an equal probability of winning is false, regardless of whether someone prefaces it with “based on what I know” or “with the information I’m given”!

17. Ralph Schneider says:

We need to distinguish between mathematical constructs and how we operate in ‘real life’. Obviously, we constantly operate on assumptions that things will behave as they have before – to not do so would paralyse us.

We then revise our assumptions if things do not behave as we expected (although such revisions may be misguided – for example, some may believe that the more they lose, the closer they are to winning, based on ‘the odds’!). But the obvious fact remains – what happens always had 100% chance of happening, whereas what didn’t happen always had 0% chance.

18. gulrez says:

baye’s theorem demystified ;thanks a billion times

19. Ralph Schneider says:

We know in advance that in our lottery only 1 ticket will win, while 999,999 will not. So it is overwhelming unlikely that any given ticket will win. That’s for sure, and it certainly indicates probability accurately, and can inform one’s choice to take a ticket or not. But to say in advance that each ticket has an equal chance, given we know in advance that only 1 will win, is nonsensical.

20. >it is overwhelming unlikely that any given ticket will win

This is the most sensible thing you’ve said!

21. Ralph Schneider says:

@ Toby
Perhaps you could say something sensible yourself instead of catcalling from the sidelines and coming up with gibberish such as your:

“Well, *I* would never say that. “Aha!” I would say instead, “But they each have an equal chance of winning BEFORE I learn the identity of the winner”. This makes it clear that lottery probabilities are facts about my knowledge about the lottery, not facts directly about the lottery itself”.

Since you have made no intelligent or intelligible contributions, we must conclude you have none to make.

22. Sorry, my comment was rather curt.

It’s the most sensible thing that you’ve said, because it’s the most useful thing that you’ve said. It is (I think) the only thing that you’ve said that allows for one to actually USE probabilities other than 0 or 1, in other words to actually use probability theory in a nontrivial way.

Actually, I should have quoted more: ‘We know in advance that in our lottery only 1 ticket will win, while 999,999 will not. So it is overwhelming unlikely that any given ticket will win. That’s for sure, and it certainly indicates probability accurately, and can inform one’s choice to take a ticket or not.’. You’re not only talking about a probability that is slightly greater than 0 (in the second sentence); you’re also linking it to knowledge (in the first sentence), which is exactly how probability becomes useful (as in the third sentence). Our knowledge in advance leads us to assign, to ANY given ticket (even the one that eventually turns out to win!), the overwhelmingly unlikely probability of 0.000001 that it will win.

This directly contradicts what you wrote next: ‘But to say in advance that each ticket has an equal chance, given we know in advance that only 1 will win, is nonsensical.’. In fact, giving each ticket an equal chance is exactly what you did in the second sentence, where ANY given ticket has an overwhelmingly unlikely chance to win. And that’s exactly the sensible thing to do in advance, that is BEFORE we have any information that distinguishes the tickets. It’s only AFTER we learn which ticket won that we change this probability from 0.000001 to 0 for 999999 of the tickets, and change it from 0.000001 to 1 for 1 of the tickets.

If you’re going to use probability to make decisions, then this is how you do it, with probability depending not only on the event in question but also on the knowledge that you have at the moment.

23. Ralph Schneider says:

That’s OK, Toby – I would rather have been less intemperate in my previous response, but I also wish you’d posted your latest well-reasoned and thoughtfully-written view instead of your “curt” comment, which raised my ire and made me think “why do I bother?”.

You are right, and I have never stated otherwise – BEFORE the lottery draw it SEEMS that each ticket has an equal chance and, even though we know that only one will win, we don’t know which. If the lottery were “fixed”, those who fixed it would know in advance which ticket would win, while we wouldn’t. So to the same ticket they assign 100% chance, we give only 0.0001%. So it is a question of advance knowledge – or not.

As I have prevously written, this is how we constantly make decisions, and to do so is generally “sensible”, as you say, based on what we know. But we also KNOW BEFORE the draw that 1 ticket has 100% chance and the rest have 0% – we just don’t know which.

This leads to my main point – about falsely particularising the general. While it may be true that 1 in 5 smokers will develop lung cancer, it is wrong – and certainly not “sensible” – to tell each smoker they each have an equal 1 in 5 probability, simply because we don’t know the outcome. It IS correct and sensible to tell them that 1 in 5 smokers develops lung cancer – but no more. It IS correct to say that 1 ticket in a million will win, but not – based only on our lack of foreknowledge – that each has the same chance. Again, the decisions made by bankers on the basis of “probability” led to the recent Global Financial Crisis – while they may say they acted “sensibly” on what they knew or believed (or chose to believe), others say with hindsight that they acted recklessly. Perhaps that is why these bankers have not been criminally charged – their defence is they merely acted on what they knew, or “sensibly” believed.

24. Ralph, you seem to be saying that the probability is SEEMINGLY 0.000001, when we have imperfect knowledge, but the probability is ACTUALLY either 0 or 1. Then using this language, it’s only the seeming probabilities, not the actual probabilities, that have any use for making decisions when one’s knowledge is incomplete (which is to say, always). And if the actual probabilities of specific events are always either 0 or 1, then we need only boolean logic to study these, not probability theory. So I would say that probability theory is really about the seeming probabilities, which are what we actually want to analyse, and so they are the only probabilities worth the name.

As for the bankers, if they really wanted to make the best decisions that they could with the information that they had, then it’s these probabilities that they should have used. It would not have been possible for them to base decisions on knowledge that they didn’t have, whatever you want to call it. Now, whether they calculated correctly, or whether they really tried to do the best, is another matter!

25. David says:

This is a great website! I recently bought Kalid’s book, which was also fantastic, and then I found his website, which contains a wealth of math topics explained at a very intuitive level for free. Thank you, Kalid, for providing us with such a great resource. I’m looking forward to your next book.

I apologize for going off topic here, but I couldn’t find an email address or more appropriate place to post this, nevertheless, I’d like to request Kalid provide us with an intuitive explanation of Differential Equations. His calculus explanations are just so good that I really would love to read his take on Differential Equations. (Once again, sorry for being off topic.)

26. kalid says:

Hi David, no worries — I really appreciate the comment. I’d love to do some more about Diff Eqs, it’s a topic that’s bothered me for a long time (never formally studied it), but I hope to start soon. Also hoping to work on a few more books :).

27. Ralph Schneider says:

Saw Prof. Brian Greene trying to explain Boltzmann’s entropy theory (i.e. the tendency of the universe to become more disorganised). Apart from my disagreement with the theory [(1) there are innumerable instances in Nature where things become more organised - e.g. formation of crystals, including snowflakes - and (2) the perception of "order" is highly subjective - e.g. to an illiterate person, writing looks like disorganised scribbles], Green’s explanation was problematical. He ripped his book apart and flung the pages in the air so they landed all over the place, then said “Why didn’t the pages land in an ordered pattern? Because there are so many ways for them to land randomly and far fewer ways for them to land orderly. That’s why there is a tendency in the universe towards disorder”.

As a physicist, he of all people should know that there was only one way for them to land – the way they did! It was not willy-nilly – it was the result of the forces acting on them. Information Theory correctly states that the entropy of a system is proportional to the lack of information about that system.

28. Crystals and subjectivity are a problem for Greene’s naïve ‘entropy = disorder’ idea of entropy, not for Boltzmann\’s theory. To be fair to Greene, it’s not his idea originally, but he should know better. Boltzmann’s ideas were also naïve, but only because he was coming up with new concepts and couldn’t possibly know better yet; even so, he was closer to the truth! He said something quite close to ‘the entropy of a system is proportional to the lack of information about that system’ (to quote Ralph above): he said that it was proportional to the logarithm of the number of ways the system could be on a microscopic scale, given our information about it on a macroscopic scale. This is naïve, because it essentially assumes that each microscopic state is equally probable (given information about the macroscopic state), although this turns out to be a rather good approximation for most realistically large systems.

29. It's also important that thermodynamic entropy is not really the same thing as information-theoretic entropy. The thermodynamic entropy is an objective property, not dependant on one’s knowledge; but it is what the information-theoretic entropy *would be* if one’s knowledge of the system were precisely knowledge of its macroscopic properties … which is often approximately the case! In particular, information-theoretic entropy tends to go down with time, as one gains information; but thermodynamic entropy famously only seems to go up.

30. It sounds like Greene was trying to get across Boltzmann's argument as to *why* thermodynamic entropy should go up. That is a bad argument, but it’s an easy trap to fall into. Given what knowledge we have now about a particular isolated system, the vast majority of possible future states have higher entropy than the present state; Greene may have demonstrated this with torn paper and the word ‘disorder’, but it's still a true fact about thermodynamic entropy. Unfortunately, the exact same argument shows that the vast majority of possible PAST states have higher entropy than the present state! So this doesn't explain why (thermodynamic) entropy goes up rather than down.

31. Incidentally, one can define information-dependant probabilities (the ‘seeming’ probabilities from late March above) using information-theoretic entropy. The probability of an event E, given information I, is 1/2 to the power of S(I) − S(I&E), where S(I) is the entropy (in bits) associated with the information I, while S(I&E) is the (smaller) entropy that would remain if one should learn additionally that E obtains.

32. Keith says:

I personally have an easier time thinking with probabilities than with odds; but I still like this idea of an evidence adjustment.
original probability * evidence adjustment = new probability

It’s easy to see this form with a slight re-write of Bayes Rule:
$\displaystyle{p(A)\cdot \frac{p(X|A)}{p(X|A)\cdot p(A) + p(X|\tilde{ }A)\cdot p(\tilde{ }A)}=p(A|X)}$

33. Kalid says:

Thanks Keith, that’s a great way to put it!

34. joe smith says:

@ Ralph,
The issue here is what you are applying the probability to. Saying “every ticket has x probability of being a winner” is an error of literal expression, not an error of the numbers. That’s how its expressed, because it is short and easily understood by the buyer.

A better literal expression of the concept is:

“Every ticket holder picks 6 numbers. Every ticket holder, prior to deciding their numbers has X chance of choosing the numbers that will be correct”

OR

more accurately, “When the drawing happens, there is X percent chance they will draw the numbers you picked on your ticket”

If we assume it is a one number per person ticket type lottery, than there is STILL equal chance per ticket. Yes, only ONE ticket actually can be the winner and all the others cannot, however if there are 100 tickets, and each person gets 1, then EACH PERSON has a 1% chance of drawing the correct ticket. (Once the tickets are drawn, the odds change, because the conditions change but we do not know by how much, until the numbers are announced).

Betting odds on the other hand are NOT ODDS AT ALL. People do not understand this. Betting odds are more accurately BETTING LINES. They are not probabilities of a team winning nor are they meant to be.

Say Dallas and Washington are going to play. Washington is given 60/40 odds over Dallas. (Now, for the record, 60/40 COULD actually be expressed as a pure probability, in that many sports predictors will use computer programs to enter both teams attributes, run 1000 or so simulations of the game, and then relate that WAS won 60% of the sims)

HOWEVER, whats really happening is, the casinos are calculating expected betters vs expected payouts. If WAS and DAL both had the same betting odds, all the smart money would bet on WAS and the casinos lose money. So, by adjusting the odds to adjust for the “favored team” they entice some people to place money on the perceived loser, and reduce the amount the have to pay out to the winner, should the perceived favorite win.

THEN they adjust again based on betting pools.
Even though WAS is a better team, meaning more “logic betters” will be on them, DAL has a large fan base, meaning more “I just like them” betters will come drop money on them, SO the casinos push a few points back to DAL to offset fan bets.

Remember, at the end of the day, vegas is not setting the odds hoping they pick the winner. They are setting the odds according to how they expect the bets to be placed, so that no matter which side wins, they can use the losers to cover the winners and still collect 20%.

35. Ralph Schneider says:

@ Joe Smith
I think that rephrasing, as you suggest, to “When the drawing happens, there is X percent chance they will draw the numbers you picked on your ticket” is still inaccurate, since you will still have either 0% or 100%, and nothing else. The problem, as I see it, is making an incorrect leap from a true statement such as “one ticket in 1 million will win” to the false statement “EACH ticket has an EQUAL chance of 1 in one million of winning”. I agree that such false statements, regrettably, help sell tickets.

36. Ralph Schneider says:

I find it reprehensible that doctors, for example, can virtually frighten a patient to death with self-fulfilling prophesies, such as falsely telling a smoker that they have a 1 in 5 chance of dying from smoking based on the observation that 1 in 5 smokers seems to do so. They should instead correctly say “1 in 5 smokers dies from smoking – you may be one of them”. Again, each smoker has either a 0% or 100% chance.

37. Ralph Schneider says:

Of course, we can safely assume that many things are certain. For example, the chances of a naked person surviving a 1,000 foot fall onto rocks could intelligently be assumed to be 0%, BASED ON EXPERIENCE, although it’s also true that, since they would either survive or not, their chances were either 100% or 0%. But if miraculously someone did survive, the chances of everyone else surviving would not suddenly increase, although the AVERAGE number (based on experience) would increase. If on average 1 in 5 people survive a 100 foot fall, we cannot say that everyone has an equal 1 in 5 chance of surviving. If the average number of children per family is 1.27, we do not conclude that every family has 1.27 children.

38. Ian Hatch says:

The probability of survival, based on observation, would increase minutely to represent our new knowledge that it is possible under certain circumstances to survive a 1,000 foot fall.

39. Ralph Schneider says:

And we should all thank that fellow who recently crossed Grand Canyon on a tightrope, because at the instant he completed his crossing he immediately increased the probability of all of us to successfully do it too!

40. My probability that I can cross the Grand Canyon on a tightrope depends on two things: my ability to cross the Grand Canyon on a tightrope and my knowledge about crossing the Grand Canyon on a tightrope. When somebody crosses the Grand Canyon on a tightrope, this increases my knowledge, so I can thank them for that; but it doesn’t increase my ability. Either way, however, it increases my probability.

It’s even clearer if somebody attempts to cross the Grand Canyon on a tightrope and fails. This also increases my knowledge, and so I thank them too, even though this time it decreases my probability.

41. Ralph Schneider says:

And there lies the absurdity – even if your ability does not vary, your “probability” of success is said to vary from time to time according to what OTHERS do!

42. Ian Hatch says:

Only when you ignore preparation, and equipment etc. There is a separate probability for “person will succeed in crossing the Grand Canyon” and “Well-prepared person will succeed in crossing the Grand Canyon”, as well as “unprepared person will succeed in crossing the Grand Canyon”.

As an aside, the knowledge gained from another would increase your probability only by putting you into a category with a better probability. The probability based on observation, if you ignored relative knowledge levels, would remain the same.

43. Ralph Schneider says:

The trouble is, such categorizations or particularizations are endless. Did the fact that he believed in the assistance of God increase his “probability”, and/or that his father was also a tightrope walker, and/or that he tied his left shoelace before his right (ad infinitum)? So we have to look at each individual, not meaninglessly ascribe “probabilities” to individuals based on generalized observations of others along with accompanying assumptions.

44. Ian Hatch says:

We can’t always do that. That’s what probabilites are for – to give us some form of visibility when we don’t have all the specifics.

It seems like you’re really confused as to the role of probabilities, and it seems to me that this thread has become something that is going to be unhelpful to people coming here to learn. Perhaps if Kalid is interested, this is a sign that a broader-level article on probability would be useful?

45. >even if your ability does not vary, your “probability” of success is said to vary from time to time according to what OTHERS do!

Even if your ability does not vary, your probability of success will vary according to what knowledge you have (and that can be affected by what others do). This is perfectly reasonable if your probability depends on both your ability and your knowledge of that ability. And the idea that the only useful notion of probability (the one which isn’t always 0 or 1, only we don’t know which, which has no practical applications) is one that depends on what information one has, as we established upthread.

46. kalid says:

@Ian, Ralph, Toby: Thanks for the discussion! I think this would be a good topic to clarify, it brings up the (somewhat philosophical) nature of what a probability means. Here’s my take:

A probability represents the uncertainty in the knowledge of the person making the prediction, and is based on a dramatically simplified model. It’s not a statement of fact, or a likelihood an individual element can reach some ‘success state’, unless that is allowed by the agreed-upon model.

For example: when dealing with idealized dice, we create the assumption there’s a 1/6 chance of every number appearing, and in this idealized model we assume we can eventually get every number to show up on a single die if we roll it enough (with near-certainty).

For a model with non-homogenous elements, like people crossing the Grand Canyon, probabilities represent our a-priori knowledge of “number of desired outcomes / number of attempts”. There’s almost certainly further classifications which, in a Bayesian way, act as clues to what leads to a more- or less-successful outcome. In these situations, the probability acts as a demographic guide, similar to an average or median, which essentially says “This number is representative of the attempts made by this population.” If we start adding expert crossers, etc., we aren’t changing the likelihood of success for the existing individuals, but we are changing the properties of the group.

I think Ralph’s point may be that in a heterogeneous population, a probability doesn’t really apply to an individual [they are or they aren't something]. Ian/Toby make the argument that a metric is still useful: even if nobody has the “average” 2.3 kids, it’s useful to know that population A has a higher average than population B :).

47. Ralph Schneider says:

A typically excellent explanation, Kalid! As you say, discussion of probability can be “somewhat philosophical”, with ancient arguments of “free will” vs. “determinism”, for example, peeping in.

48. kalid says:

Thanks Ralph, happy you enjoyed it. Yep, there’s definitely a few interesting rabbit-holes thinking about math can lead us down :).

49. Ian Hatch says:

Ah, I can breathe again! Great summary!
Thanks kalid for breaking that down in such a clear way, and thanks to Ralph for sparking the discussion!

50. TheDjinni says:

““Aha!” you may say, “But they each have an equal chance of winning BEFORE the winner is drawn”. But how can they each have an equal chance of winning if only one wins and the others lose – whether before or after the draw?”

Because that’s what the definition of chance is? Chance is the word used to describe the expected result of unknown outcomes. “One is the winner and the others have lost” describes a statement about a past event with a known outcome: you can identify the winners and losers, and the random draw that selected them.

Before the draw takes place, there are no winners, and no losers. To speak of “the winner” before the winner has been decided is to speak nonsense; one can only talk about the candidates and the potential outcomes. Furthermore, to speak of “the winner” when you do not know who won is to speak nonsense; one can again only speculate on what the outcome was. Probability comes from assigning quantities to potential outcomes and then taking the average over repeated trials. The outcome is decidedly random, and this randomness is described and quantified using reference to ‘chance’.

Only if you presume that the draw, and ultimately life itself, is entirely and fully deterministic can you come to the conclusion that there is a winner and a loser decided beforehand… and even if you did that, there would be no practical value in stating such, since a fully deterministic system can’t be identified as such by actors in the system, nor can the outcomes be determined a priori by said actors. One must still rely on probabilities to make any predictive statements.

“But I will point out that the recent Global Financial Crisis resulted from reliance on on “probabilities”!”

And I can point out that the capital of France is Timbuktu, that doesn’t make it true.

51. Ralph Schneider says:

The point is that, before a lottery, it is known that there will be winners and losers – it is therefore nonsensical to say that EACH has an EQUAL chance, since it is known before the lottery that they don’t.

The reason that saying Timbuktu is the capital of France does not make it true is simply that it is not true.

52. Ralph Schneider says:

It is absurd to maintain, simply because we do not know which one(s) will win, that each has an equal chance.

53. TheDjinni wrote in part:

>to speak of “the winner” when you do not know who won is to speak nonsense […] Only if you presume that the draw, and ultimately life itself, is entirely and fully deterministic can you come to the conclusion that there is a winner and a loser decided beforehand

One can speak of the winner and losers before one knows who they are, and even if one rejects determinism in various senses. For example, one might say ‘The winner will be very happy.’, even before the draw has made, without making any philosophical commitment about determinism, and making a meaningful statement (and indeed one with a high probability of being true).

But of course, as long as we don’t actually know who the winner is, our statements about ‘the winner’, while meaningful, are still probabilistic statements. Probability is indeed what we use to talk about things when we don’t know everything about them (which is, as a practical matter, always), despite some people’s objections to the contrary.

54. Stefan Sonnenberg says:

@Ralph Schneider:
I guess you make a central mistake, but I think you realized already.
If you flip a coin, the chances for head an tail are equal:
One object (the coin) has both sides (or two possible states), if you flip it,
it will either be head or tail.

Now you have 1.000.000 papers, and 1 has the winning number written on it.
From this perspective, you have also two possible states (won/not won), but
only one out of one million will satisfy the condition, thus giving the 1:1.000.000 probability.

It is obvious to say: if you have drawn the winning ticket, it is the one out
of 100% winning tickets : there was only one.

55. Ralph Schneider says:

@Stefan Sonnenberg
I think you “make a central mistake” in not understanding my point – you have merely re-stated the obvious! My point is simply that it is obviously incorrect to state that each “piece of paper” has an EQUAL chance of containing the number, since only one has it.

56. wererogue says:

But the word “chance” is key there. It defines the use of probability which deals in the likelihood of specific unknowns.

Ralph, nobody is saying that a piece of paper that doesn’t have the number on has a chance of becoming a piece of paper that has it. We’re saying that when choosing one of the pieces of paper, the winning one being unknown, your knowledge is not sufficient enough for you to know that any piece of paper is a better choice.

I think that you have a good point to make somewhere in your argument about the general trend for people who make decisions based on probability to believe that the decision is thus safe or guaranteed. But you seem determined to throw out the baby with the bathwater – probability is not only useful, it’s inherent in our brains’ decision making processes, and without formalizing it we would only worsen our decisions.

57. Ralph Schneider says:

@wererogue
Far from being “determined to throw out the baby with the bathwater”, I made this precise point in a much earlier post on this thread.

If a handful of respondents bothered to read – and think clearly about – my previous posts, I would not have to repeatedly correct what they ascribe to me.

I’m not sure if there is a word for hanging an idea on a person which they do not hold and then criticizing them for that idea, but unfortunately it’s a centuries-old, infamous, common, disappointing form of intellectual dishonesty.

58. Drew says:

@Ralph Schneider

> it is obviously incorrect to state that each “piece of paper” has an EQUAL chance of containing the number, since only one has it.

Let’s unpack this sentence.

> To state that “each ‘piece of paper’ _is_assigned_ an equal chance of containing the winning number” incorrectly reflects reality.

The verb “has” is overloaded. Others have been using “epistemic::has” while you’ve been using “ontological::has”. When others said “but the tickets do an equal chance”, they really meant “but Bayesian Probability does an equal chance”. But you’ve interpreted this like a Frequentist would: “Are you suggesting each individual ticket possesses an _intrinsic_attribute_ which is neither entirely true nor entirely false? This violates the Law of Excluded Middle. You might as well claim I’m half pregnant!” But really, this thread is simply bickering over the definition of “has”.

Would you like to know why others keep returning to the definition of “chance”? The reason is because you don’t appear to fully grasp that THOUGHTS =/= REALITY. To elaborate, Frequentism defines chance in terms of reality per se, while Bayesianism defines chance in terms of beliefs about reality (which are simply thoughts). From the Flying Spaghetti Monster’s perspective, all propositions describing the attributes of tangible objects can either be 0% true, xor 100% true. But humans can’t know that, we aren’t omniscent. At best, we can mentally approximate a model of reality using the limited information our sensory perceptions afford us (IIRC, David Hume). Humans can assign percentages of credibility to our thoughts about reality (e.g. to the thought that I am fully pregnant, I assign 50% credibility), even though the Flying Spaghetti Monster cannot assign percentages to reality per se (e.g. FSM can’t make me half pregnant). It’s really no different than how all other math works. Numbers don’t literally exist in the real world, but biject to a great many things. The arbitrary rules that govern the mental manipulation of numbers allow our brains to run simulations without actually executing our (potentially costly) plan in real life.

> Statements like “…” and “…” are commonplace – and meaningless. Even worse, they can and do give false confidence leading to ruin, since people are misled into basing their judgements on them.

Probability and statistics are certainly meaningful. On occasion, we’ve even found them useful! They help us: determine the confidence of our predictions (quantify certainty); turn unknown-unknowns into known-unknowns (quantify uncertainty); and most importantly, improve our decision-making (choose the car more often than the goat). They’re used in: Big Data (targeted advertising); Actuarial Science (insurance); Finance (loans, Wall St); Science in general (causation); Demographics (populations); Meteorology (weather-forecasts); Medicine (diagnostics/prognoses); Engineering (tolerances); Psychohistory (jk); et al.

> And doctors and health lobbyists should be prohibited from making such outrageous statements as “You have a one in five chance of surviving” based on data that one in five people survives. You will either survive (100%) or not (0%), so you do not have 20% chance, but either 100% or 0%. The problem is that some people literally worry themselves to death when they hear such pronouncements from doctors (who should know better, and may just be covering themselves in advance).

You see, EY’D take issue with my doctor omitting a 20% survival prognosis. Even if probabilities technically have zero ontological basis, the prognosis sends a strong signal which provides actionable information to an important crossroads. We live in a CAUSAL UNIVERSE, and though I may exert negligible control over a coin flip, a prognosis (especially an early prognosis) may very well prompt me to request an appropriate treatment and SAVE MY LIFE (or as the song goes, “live like you were dyin”; or donate my organs; or donate my body to science; or sign up for cryonics; or get my assets in order; or get married; or tell my family goodbye; or honorably commit seppuku; or ride into the sunset; or finish “War & Peace”; or travel abroad; or find biological parents; or find long lost cousins; etc).

What’s the alternative? “You have the foobar disease. You have either 100% chance of survival or 0% chance of survival.” This is a tautology, which therefore gives me zero information. And if my doctor includes “In the past, only 1 out of 5 people died”, you might as well just come out and say “as far as we can estimate our probabilities of your prognosis, your survival chance is 80%,” since it conveys exactly the same information. Even though diseases don’t come with html tags that say , the two interpretations convey the same information and are therefore EPISTEMICALLY EQUIVALENT. Note that in practice, the two interpretations may beget different results due to cognitive bias. But intentionally disposing a patient towards a particular decision feels like it would conflict with the Hippocratic Oath.

I don’t think the cons outweigh the pros. I’d much rather grin and bear the negative externalities than give up for example, car insurance, or not-wobbly chairs, or my doctor’s 20% prognosis disclosure. And sure, it’s unfortunate that the media often misleads the public. But that’s a question of ethics and policy (and gullibility) – not whether the math is meaningful or useful.

> But I will point out that the recent Global Financial Crisis resulted from reliance on on “probabilities”!

I might as well point out that a recent car crash resulted from a reliance on “kinetic energy”. I mean, what else are economists going to really on? Are they going to pluck “certainties” from thin air? Are they going to quit their jobs if they can’t? Should we eliminate uncertainty through a nuclear holocaust? Should we instead rely on petrol? Will the lizard people emancipate us from uncertainty? Will Communism meet the Boolean quotas? Does it run minesweeper? Does it blend? Does it even lift?

P.S.

> The error consists of extending the general to the particular (the opposite of generalizing).

“Ecological Fallacy”; “reification”; “Fallacy of Composition”

59. John says:

Drew, thank you so much for this response! Excellently written. Uncertainty is inherent to the world we live in — a large portion of the sciences would be drastically less productive if we used this narrow interpretation of outcomes.

Some of the other applications that you missed include: Chemistry (physical chemistry, kinetics), Physics (quantum mechanics), Statistical Mechanics (shared by Chemistry and Physics), Operations Research/Queueing Theory, Computer Science (Artificial Intelligence, Machine Learning, Modeling/Simulation), and Mathematics (Stochastic processes, dynamical systems). And I’m sure there are even more that we missed.

Not to mention, think of all the brilliant people who have laid the groundwork for probability and statistics over the last few hundred years. Think of all of the brilliant scientists and academics who advance probability and statistics these days. Ralph Schneider: are you saying that some of the greatest minds ever, who are backed by thousands upon thousands of professionals who dedicate their lives to rigorously studying these topics, were horrendously wrong? That their interpretation of probability is useless, and you hold the key insight?