A Brief Introduction to Probability & Statistics

I’ve studied probability and statistics without experiencing them. What’s the difference? What are they trying to do?

This analogy helped:

  • Probability is starting with an animal, and figuring out what footprints it will make.
  • Statistics is seeing a footprint, and guessing the animal.

Probability is straightforward: you have the bear. Measure the foot size, the leg length, and you can deduce the footprints. “Oh, Mr. Bubbles weighs 400lbs and has 3-foot legs, and will make tracks like this.” More academically: “We have a fair coin. After 10 flips, here are the possible outcomes.”

Statistics is harder. We measure the footprints and have to guess what animal it could be. A bear? A human? If we get 6 heads and 4 tails, what’re the chances of a fair coin?

The Usual Suspects

Here’s how we “find the animal” with statistics:

Get the tracks. Each piece of data is a point in “connect the dots”. The more data, the clearer the shape (1 spot in connect-the-dots isn’t helpful. One data point makes it hard to find a trend.)

Measure the basic characteristics. Every footprint has a depth, width, and height. Every data set has a mean, median, standard deviation, and so on. These universal, generic descriptions give a rough narrowing: “The footprint is 6 inches wide: a small bear, or a large man?”

Find the species. There are dozens of possible animals (probability distributions) to consider. We narrow it down with prior knowledge of the system. In the woods? Think horses, not zebras. Dealing with yes/no questions? Consider a binomial distribution.

Look up the specific animal. Once we have the distribution (“bears”), we look up our generic measurements in a table. “A 6-inch wide, 2-inch deep pawprint is most likely a 3-year-old, 400-lbs bear”. The lookup table is generated from the probability distribution, i.e. making measurements when the animal is in the zoo.

Make additional predictions. Once we know the animal, we can predict future behavior and other traits (“According to our calculations, Mr. Bubbles will poop in the woods.”). Statistics helps us get information about the origin of the data, from the data itself.

Ok! The metaphor isn’t perfect, but more palatable than “Statistics is the study of the collection, organization, analysis, and interpretation of data”. Need proof? Let’s see if we can ask intuitive “I tasted it!” questions:

  • What are the most common species? (Common distributions)
  • Are new ones being discovered?
  • Can we predict the next footprint? (Extrapolation)
  • Are the tracks following a path? (Regression / trend line)
  • Here’s two tracks, which animal was faster? Bigger? (Data from two drug trials: which was more effective?)
  • Is one animal moving in the same direction as another? (Correlation)
  • Are two animals tracking a common source? (Causation: two bears chasing the same rabbit)

These questions are much deeper than what I pondered when first learning stats. Every dry procedure now has a context: are we learning a new species? How to take the generic footprint measurements? How to make a table from a probability distribution? How to lookup measurements in a table?

Having an analogy for the statistics process makes later data crunching click. Happy math.

PS. The forwards-backwards difference between probability and statistics shows up all over math. Some procedures are easy to do (derivatives) but difficult to undo (integrals). (Thanks Denis)

Other Posts In This Series

  1. A Brief Introduction to Probability & Statistics
  2. How To Analyze Data Using the Average
  3. An Intuitive (and Short) Explanation of Bayes' Theorem
  4. Understanding Bayes Theorem With Ratios
  5. Understanding the Birthday Paradox
  6. Understanding the Monty Hall Problem

Questions & Contributions


  1. Will Big Data (http://en.wikipedia.org/wiki/Big_data) change the way Statisics is performed/used? If the sample size is very close to the target size (might be mixing up the terms) will it change the nature of statistics which historically was used to create meaning from small amounts of data? Using your metaphor: If we tagged almost every animal in the woods, would that change how we determine what made the footprints?

  2. Adam, great question. I’m not familiar with Big Data, but one analogy (take from discussion here: http://www.reddit.com/r/math/comments/zevvv/a_brief_introduction_to_probability_statistics/) is that it could be like not just measuring the footprint of the bear, but getting its DNA sequence. Whoa! What can we figure out then? [“This bear is going to have kids who have footprints like this…” or even “This bear is going to have high blood pressure” :)].

    So yep, I think it’s tagging animals at a super-deep level, to make even crazier predictions.

  3. To Adam,

    The “Big Data” issue is a pivotal question in Statistics right now. Will it change the basic way we perform statistical analysis? No. The problem is that with all of these new sources for data, social media, youtube, text messages, anything and everything. We are receiving data a lot more data, at a much higher rate, and in many different forms(Think structured, data that fits in rows and columns, and unstructured data, like a youtube video) than we are historically used to handling. Like I said before this will not change our statistical measures but it will change how we collect, store, and think about data as a society. We need bigger places to store them and faster ways to analyze. I hope this messy explanation clears up a little bit about “Big Data”.

  4. Good Start!! @Kalid… Please suggest some good books to study Probability & Statistics with minimum or fairly good math background.

  5. Hi Krishna, I’d like to setup a suggested-reading list for the site. I’m still looking for my favorite probability books.

  6. hi bill. the book mentioned by you is very costly. i am from India. can you suggest me any other affordable good books or resources where i can download the same.

  7. Krishna, the cheapest I could find it when I bought the book was as a Kindle E book but it is still expensive I know.

    A free online book that is clear and covers the basics with good worked examples is Statistics at Square One by the British Medical Journal: http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one

    Beyond that I’ve found most stats books have too many consecutive formulas with minimal explanation, and not enough fully worked examples with real data, or simply badly explained!

    This is why this site is so good. Kalid realizes that understanding is the most important thing first, allowing you to build your knowledge from that solid base all the way to those complex formulas. Good work Kalid!

  8. Adam, Walter,

    The Big Data issue is more than just the amount of data, though having large amounts of data is part of the solution. It’s about not necessarily understanding a particular model, but just letting the data, the statistics, provide answers. So in translation systems like Google Translate the idea is to drop any linguistic model. Linguistic models are way too complicated, and they vary over time and context – read urbandictionary.com. There’s not way to keep the models updated to make current relevent translations. By looking for all the ways in which sets of words are used on masses of data there becomes a statistical likelihood that a particular translation will be the best understood. It doesn’t matter if the translation defies all grammatical rules of a language, and hence any liklihood of getting into some formal linguistic model, for example when translatig some very popular current idiom. All we need to know is what it is what’s being used, and so what will be understood.

    Naturally, there’s some controversy over this.

  9. I love your website because I have the same viewpoint as you as for visual thinking.

    As a former statistical process control engineer from the Deming’s school, I may formulate differently :

    – statistics is descriptive
    – probability is predictive

    I would rather use the term infer the model than predict the model, because once you get the model, you’ll try to make prediction on future data range.

    The big problem is that most people will infer that the model is the Normal Law because that’s what they were taught at school. Deming and Shewhart did insist that Normal should not be applied in real world and in fact scientists were very unrigorous about their usage. For example if you learnt at school the Henry graphical method http://fr.wikipedia.org/wiki/Droite_de_Henry (I can’t find any english link) well it can very wrong to infer Normal Law from it in real life matter like industrial quality because it is a static model whereas living things are dynamic (the model shape itself can change over time !).

    The purpose of statistical process control is roughly to make the process model of an industrial product stable in time.

  10. Rule #1 from the Eric book of Stats (may the gods help us if that book is ever written)
    “Your results don’t mean anything, until and unless they mean something”

    I’ve heard everyone from electrical engineers to political pollsters get all ginned up over statistics and miss the fundamental relationship behind the wall of data they throw up. Don’t get me wrong I love a good discussion on harmonic mean vs. arithmetic mean as much as the next guy. I’ve just seen a lot of cases where people forget that stats is intended as a tool to better see the world around us, not a veil to falsely lend authority to our preconceived notions.

    My Rule #1 above has the following corollary:
    There exist 3 things in statistics that have some (maybe smaller than you think) level of importance. Starting with the least important they are
    A) your results
    B) your explanation of your results
    C) your justification of your explanation of your results

    An example of A)
    -the ‘average’ height of 10 students in class is 5’9″
    -the ‘average’ bacteria population in 10 petri dishes after 2 hours is 12,000,000
    -the ‘average’ gain of the 10 amplifiers is 2.3 dB

    An example of B)
    -collected 10 data points and used arithmetic mean
    -collected 10 data points and used harmonic mean
    -applied same input to 10 devices, collected output on log scale, converted each to linear scale, took arithmetic mean, converted ‘average’ output back to log scale, used ‘average’ log output and log input to determine ‘average’ gain by the following equation…

    An example of C)
    ‘I used this equation because…’
    ‘Here are other equations I could have used, they would have this effect, that model wouldn’t have fit this example because…’

  11. Thanks so much! After so many years of learning Stats, it is the first time I find it can be so interesting and fascinating!

  12. Wow, I have decided to do my final paper about the how and why of statistics and I have just stumbled into your website while doing homework on statistics. YAY!!!! I have used stats in my work reporting to federal and state agencies but never really understood what they meant or why they even needed them really. So that is why I chose this for my final paper. I have a couple of good places to turn, but I am so much more excited about this website than the other ones. Thank you for doing something like this for those of us who are “mathematically challenged”

  13. try once “the cartoon guide of statistics” (ISBN-13: 978-0062731029): a nice beginning for a so to say boring topic.
    But not so easy as one might think.
    “If you have ever looked for P-values by shopping at P mart, tried to watch the Bernoulli Trails on “People’s Court,” or think that the standard deviation is a criminal offense in six states, then you need The Cartoon Guide to Statistics”

  14. @damib that book is really great, an not easy. It clears lots of things if you already have some basic knowledge on the subject.

    @kalid, your picture probability vs. statistics is really great, things dropped in place in my head.

    I do have a question for you, which bothers me already some time: why ‘standard’? where does the ‘standard’ in ‘standard deviation’ come from? (same for ‘standard error’). ‘Standard’ compared to what? Variance? and if so, why call it ‘standard’ and not ‘mean deviation’?

    Love your posts, I find them insightful.

    Cheers, F.

  15. Hi Kalid, You’re absolutely awesome!!! I am reading all your Logarithm related article and so far I am winning (my invisible cheerleaders on my desk are dancing!!). Any chance you can do article on standard deviation? It also used a lot in life – with our real life exampled, I might able to get another chance to see my invisible cheerleaders on my desk!!! :>

  16. @flomi: Good question, not sure why it’s called “standard” (vs. non-standard deviation?). I’d like to study it more.

    @Youngmi: Stats is something I’d like to get into more, hope to give you some more insights :).

Your feedback is welcome -- leave a reply!

Your email address will not be published.

LaTeX: $ $e=mc^2$$