I've studied probability and statistics without experiencing them. What's the difference? What are they trying to do?

This analogy helped:

**Probability**is starting with an animal, and figuring out what footprints it will make.**Statistics**is seeing a footprint, and guessing the animal.

Probability is straightforward: you have the bear. Measure the foot size, the leg length, and you can deduce the footprints. "Oh, Mr. Bubbles weighs 400lbs and has 3-foot legs, and will make tracks like this." More academically: "We have a fair coin. After 10 flips, here are the possible outcomes."

Statistics is harder. We measure the footprints and have to guess what animal it could be. A bear? A human? If we get 6 heads and 4 tails, what're the chances of a fair coin?

## The Usual Suspects

Here's how we "find the animal" with statistics:

**Get the tracks**. Each piece of data is a point in "connect the dots". The more data, the clearer the shape (1 spot in connect-the-dots isn't helpful. One data point makes it hard to find a trend.)

**Measure the basic characteristics**. Every footprint has a depth, width, and height. Every data set has a mean, median, standard deviation, and so on. These universal, generic descriptions give a *rough* narrowing: "The footprint is 6 inches wide: a small bear, or a large man?"

**Find the species**. There are dozens of possible animals (probability distributions) to consider. We narrow it down with prior knowledge of the system. In the woods? Think horses, not zebras. Dealing with yes/no questions? Consider a binomial distribution.

**Look up the specific animal**. Once we have the distribution ("bears"), we look up our generic measurements in a table. "A 6-inch wide, 2-inch deep pawprint is most likely a 3-year-old, 400-lbs bear". The lookup table is generated from the probability distribution, i.e. making measurements when the animal is in the zoo.

**Make additional predictions**. Once we know the animal, we can predict future behavior and other traits ("According to our calculations, Mr. Bubbles will poop in the woods."). Statistics helps us get information about the origin of the data, from the data itself.

Ok! The metaphor isn't perfect, but more palatable than "Statistics is the study of the collection, organization, analysis, and interpretation of data". Need proof? Let's see if we can ask intuitive "I tasted it!" questions:

- What are the most common species? (Common distributions)
- Are new ones being discovered?
- Can we predict the next footprint? (Extrapolation)
- Are the tracks following a path? (Regression / trend line)
- Here's two tracks, which animal was faster? Bigger? (Data from two drug trials: which was more effective?)
- Is one animal moving in the same direction as another? (Correlation)
- Are two animals tracking a common source? (Causation: two bears chasing the same rabbit)

These questions are much deeper than what I pondered when first learning stats. Every dry procedure now has a context: are we learning a new species? How to take the generic footprint measurements? How to make a table from a probability distribution? How to lookup measurements in a table?

Having an analogy for the statistics process makes later data crunching click. Happy math.

PS. The forwards-backwards difference between probability and statistics shows up all over math. Some procedures are easy to do (derivatives) but difficult to undo (integrals). (Thanks Denis)

LilySeptember 5, 2012 at 5:43 pmYay Statistics!

kalidSeptember 5, 2012 at 5:56 pmIt was getting time to start writing about it :).

Adam WeisblattSeptember 6, 2012 at 8:03 amWill Big Data (http://en.wikipedia.org/wiki/Big_data) change the way Statisics is performed/used? If the sample size is very close to the target size (might be mixing up the terms) will it change the nature of statistics which historically was used to create meaning from small amounts of data? Using your metaphor: If we tagged almost every animal in the woods, would that change how we determine what made the footprints?

kalidSeptember 6, 2012 at 11:29 amAdam, great question. I’m not familiar with Big Data, but one analogy (take from discussion here: http://www.reddit.com/r/math/comments/zevvv/a_brief_introduction_to_probability_statistics/) is that it could be like not just measuring the footprint of the bear, but getting its DNA sequence. Whoa! What can we figure out then? [“This bear is going to have kids who have footprints like this…” or even “This bear is going to have high blood pressure” :)].

So yep, I think it’s tagging animals at a super-deep level, to make even crazier predictions.

WalterOctober 14, 2012 at 4:55 pmTo Adam,

The “Big Data” issue is a pivotal question in Statistics right now. Will it change the basic way we perform statistical analysis? No. The problem is that with all of these new sources for data, social media, youtube, text messages, anything and everything. We are receiving data a lot more data, at a much higher rate, and in many different forms(Think structured, data that fits in rows and columns, and unstructured data, like a youtube video) than we are historically used to handling. Like I said before this will not change our statistical measures but it will change how we collect, store, and think about data as a society. We need bigger places to store them and faster ways to analyze. I hope this messy explanation clears up a little bit about “Big Data”.

Krishna ChaitanyaJanuary 21, 2013 at 3:56 amGood Start!! @Kalid… Please suggest some good books to study Probability & Statistics with minimum or fairly good math background.

kalidJanuary 25, 2013 at 2:12 pmHi Krishna, I’d like to setup a suggested-reading list for the site. I’m still looking for my favorite probability books.

BillJanuary 26, 2013 at 12:55 amFor learning stats with minimal assumed prior knowledge this is the best book I’ve found. Although this one partners SPSS there are versions of the same book for R and SAS statistical packages. Andy Field’s ‘Statistics Hell’ website is worth a look too.

http://www.amazon.com/gp/product/1847879071?ie=UTF8&force-full-site=1&ref_=aw_bottom_links

Krishna ChaitanyaJanuary 30, 2013 at 1:36 amThank u kalid. waiting for the book list…

Thank u bill for the resource.

krishna chaitanyaJanuary 31, 2013 at 8:07 amhi bill. the book mentioned by you is very costly. i am from India. can you suggest me any other affordable good books or resources where i can download the same.

BillJanuary 31, 2013 at 12:23 pmKrishna, the cheapest I could find it when I bought the book was as a Kindle E book but it is still expensive I know.

A free online book that is clear and covers the basics with good worked examples is Statistics at Square One by the British Medical Journal: http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one

Beyond that I’ve found most stats books have too many consecutive formulas with minimal explanation, and not enough fully worked examples with real data, or simply badly explained!

This is why this site is so good. Kalid realizes that understanding is the most important thing first, allowing you to build your knowledge from that solid base all the way to those complex formulas. Good work Kalid!

Krishna ChaitanyaFebruary 3, 2013 at 5:20 pmthank u bill for the wonderful source of information.

Ron MurphyMarch 2, 2013 at 5:49 amAdam, Walter,

The Big Data issue is more than just the amount of data, though having large amounts of data is part of the solution. It’s about not necessarily understanding a particular model, but just letting the data, the statistics, provide answers. So in translation systems like Google Translate the idea is to drop any linguistic model. Linguistic models are way too complicated, and they vary over time and context – read urbandictionary.com. There’s not way to keep the models updated to make current relevent translations. By looking for all the ways in which sets of words are used on masses of data there becomes a statistical likelihood that a particular translation will be the best understood. It doesn’t matter if the translation defies all grammatical rules of a language, and hence any liklihood of getting into some formal linguistic model, for example when translatig some very popular current idiom. All we need to know is what it is what’s being used, and so what will be understood.

Naturally, there’s some controversy over this.

lepine kongAugust 6, 2013 at 11:45 pmI love your website because I have the same viewpoint as you as for visual thinking.

As a former statistical process control engineer from the Deming’s school, I may formulate differently :

– statistics is descriptive

– probability is predictive

I would rather use the term infer the model than predict the model, because once you get the model, you’ll try to make prediction on future data range.

The big problem is that most people will infer that the model is the Normal Law because that’s what they were taught at school. Deming and Shewhart did insist that Normal should not be applied in real world and in fact scientists were very unrigorous about their usage. For example if you learnt at school the Henry graphical method http://fr.wikipedia.org/wiki/Droite_de_Henry (I can’t find any english link) well it can very wrong to infer Normal Law from it in real life matter like industrial quality because it is a static model whereas living things are dynamic (the model shape itself can change over time !).

The purpose of statistical process control is roughly to make the process model of an industrial product stable in time.

Eric VMay 6, 2014 at 10:24 pmRule #1 from the Eric book of Stats (may the gods help us if that book is ever written)

“Your results don’t mean anything, until and unless they mean something”

I’ve heard everyone from electrical engineers to political pollsters get all ginned up over statistics and miss the fundamental relationship behind the wall of data they throw up. Don’t get me wrong I love a good discussion on harmonic mean vs. arithmetic mean as much as the next guy. I’ve just seen a lot of cases where people forget that stats is intended as a tool to better see the world around us, not a veil to falsely lend authority to our preconceived notions.

My Rule #1 above has the following corollary:

There exist 3 things in statistics that have some (maybe smaller than you think) level of importance. Starting with the least important they are

A) your results

B) your explanation of your results

C) your justification of your explanation of your results

An example of A)

-the ‘average’ height of 10 students in class is 5’9″

-the ‘average’ bacteria population in 10 petri dishes after 2 hours is 12,000,000

-the ‘average’ gain of the 10 amplifiers is 2.3 dB

An example of B)

-collected 10 data points and used arithmetic mean

-collected 10 data points and used harmonic mean

-applied same input to 10 devices, collected output on log scale, converted each to linear scale, took arithmetic mean, converted ‘average’ output back to log scale, used ‘average’ log output and log input to determine ‘average’ gain by the following equation…

An example of C)

‘I used this equation because…’

‘Here are other equations I could have used, they would have this effect, that model wouldn’t have fit this example because…’

Cloudera Data Science Study Guide | datawithdennisNovember 5, 2014 at 6:02 am[…] com on Probability, Statistics, Bayes Theorem […]

janeFebruary 5, 2015 at 5:59 pmThanks so much! After so many years of learning Stats, it is the first time I find it can be so interesting and fascinating!

Luke ChenMarch 7, 2015 at 11:50 amCan you do a guide on degrees of freedom in statistics?

BrendaMarch 23, 2015 at 9:18 pmWow, I have decided to do my final paper about the how and why of statistics and I have just stumbled into your website while doing homework on statistics. YAY!!!! I have used stats in my work reporting to federal and state agencies but never really understood what they meant or why they even needed them really. So that is why I chose this for my final paper. I have a couple of good places to turn, but I am so much more excited about this website than the other ones. Thank you for doing something like this for those of us who are “mathematically challenged”

kalidMarch 29, 2015 at 4:21 pmThanks Brenda, really glad it clicked!

damidbApril 23, 2015 at 5:38 amtry once “the cartoon guide of statistics” (ISBN-13: 978-0062731029): a nice beginning for a so to say boring topic.

But not so easy as one might think.

“If you have ever looked for P-values by shopping at P mart, tried to watch the Bernoulli Trails on “People’s Court,” or think that the standard deviation is a criminal offense in six states, then you need The Cartoon Guide to Statistics”

flomiJune 12, 2015 at 5:11 am@damib that book is really great, an not easy. It clears lots of things if you already have some basic knowledge on the subject.

@kalid, your picture probability vs. statistics is really great, things dropped in place in my head.

I do have a question for you, which bothers me already some time: why ‘standard’? where does the ‘standard’ in ‘standard deviation’ come from? (same for ‘standard error’). ‘Standard’ compared to what? Variance? and if so, why call it ‘standard’ and not ‘mean deviation’?

Love your posts, I find them insightful.

Cheers, F.

YoungmiNovember 5, 2015 at 1:32 pmHi Kalid, You’re absolutely awesome!!! I am reading all your Logarithm related article and so far I am winning (my invisible cheerleaders on my desk are dancing!!). Any chance you can do article on standard deviation? It also used a lot in life – with our real life exampled, I might able to get another chance to see my invisible cheerleaders on my desk!!! :>

kalidNovember 8, 2015 at 1:03 pm@flomi: Good question, not sure why it’s called “standard” (vs. non-standard deviation?). I’d like to study it more.

@Youngmi: Stats is something I’d like to get into more, hope to give you some more insights :).

KammMay 30, 2016 at 3:49 pm. For each study in question 11, indicate whether a one- or a two-tailed test should be used and state the H0 and Ha. Assume that μ = 50 when the amount of the independent variable is zero.

KammMay 30, 2016 at 3:53 pmI am having a hard time understanding how to do this can anyone show me . On a test of motor coordination, the population of average bowlers has a mean score of 24, with a standard deviation of 6. A random sample of 30 bowlers at Fred’s Bowling Alley has a sample mean of 26. A second random sample of 30 bowlers at Ethel’s Bowling Alley has a mean of 18. Using the criterion of p = .05 and both tails of the sampling distribution, decide if each sample represents the population of average bowlers? Population of Average Bowlers=Mean of 24 Standard Deviation=6 Criterion=.05 or 50% Fred’s: 30 Bowlers=Mean of 26 Ethel’s:

cristinaOctober 26, 2016 at 10:25 amThis is brief and concise.thanks

مقدّمة في الاحتمالات والإحصاء والفرق بينهما | مؤسسة فاي للعلومJanuary 1, 2017 at 8:59 am[…] المصدر […]

ashMarch 14, 2017 at 9:13 amat first i thought it was that hard to understand statistics but thanks to this ive learn the easier way to understand statistics thank!!! God Bless :)

manideep lankaJuly 16, 2017 at 11:05 pmWow…the forward/backward difference is a very intuitive idea. I always find it hard to understand what probability distribution is. Can somebody explain?

Cathy FloydJuly 27, 2017 at 5:35 pmThis may not be the place to ask this question, but the school my grandson goes to is totally non responsive.

In your opinion, is the high school course Probabilities and Statistics a class for a student that barely passed Algebra I & II and flunked Geometry? He hates math but needs 1 more Math credit to graduate. Trying to find something to take and not much offered.

shahab uddinOctober 8, 2017 at 12:17 amThis is plain dumb. No offense. I love your content usually.

YuliaDecember 8, 2017 at 1:18 amI did not get with all analogies…e.g. let’s have the trajectories of other planets seeing from the earth – it is seen as retrograde motion, so far long ago Copernik was looking at these trajectories and proposed another model heliocentric vs. existent geocentric. This is statistics – see the data and figuring out what model creates these data. Looking backward: we know the heliocentric and geocentric models, and we know that in both models planets creates these trajectories for sure. Where is probability here? In these case, we are dealing with proven facts, not with uncertainty. Help to understand, please. Similarly, with the bear example – we know for sure what kind of footprints the bear could leave if we know his parameters…where is probability here? The probability dealing with events, not models…so I am confused, would appreciate more explanation to understand.

MalgosiaJanuary 10, 2018 at 8:50 amLove this!!

NandiniJanuary 15, 2018 at 11:39 pmHey kalid, I have been struggling a lot in understanding the logic behind this:

Median=l+(n/2-cf)/f*h

I know I can use this formula when a question related to continuous grouped data pops up! But how can I figure what’s going on inside this equation? How can I understand the logic behind this?

Data ScienceMarch 13, 2020 at 8:12 amThanks for writing this post. This article provides far sighted approach to probability and statistics. I was able to relate how both these terms are related especially from the example of animals and footprints.

In short, article provided pleasant taste of probability with statistics. From my side, there are some points which also needs to be pondered which are mentioned below. I hope these points can further chew more insights with respect to probability and stats.

1.) Probability with the help of random variable and probability distribution graph can help to grow the business and increase sale.

2.) It in fact helps in analysing the behaviour of customers while shopping (talk is with respect to their payment methods) and helps to serve them better.

To discuss such point and bring out causation with respect to this domain, i also made a small effort by presenting the below video. I hope this will be some use.

https://bit.ly/39OyoGT