The *average* is a simple term with several meanings. The type of average to use depends on whether you’re adding, multiplying, grouping or dividing work among the items in your set.

**Quick quiz:** You drove to work at 30 mph, and drove back at 60 mph. What was your average speed?

Hint: It’s not 45 mph, and it doesn’t matter how far your commute is. Read on to understand the many uses of this statistical tool.

## But what does it mean?

Let’s step back a bit: what is the “average” all about?

To most of us, it’s “the number in the middle” or a number that is “balanced”. I’m a fan of taking multipleviewpoints, so here’s another interpretation of the average:

**The average is the value that can replace every existing item, and have the same result.** If I could throw away my data and replace it with one “average” value, what would it be?

One goal of the average is to understand a data set by getting a “representative” sample. But the calculation depends on how the items in the group interact. Let’s take a look.

## The Arithmetic Mean

The arithmetic mean is the most common type of average:

Let’s say you weigh 150 lbs, and are in an elevator with a 100lb kid and 350lb walrus. What’s the average weight?

The real question is “If you replaced this merry group with 3 identical people and want the same load in the elevator, what should each clone weigh?”

In this case, we’d swap in three people weighing 200 lbs each [(150 + 100 + 350)/3], and nobody would be the wiser.

Pros:

- It works well for lists that are simply combined (added) together.
- Easy to calculate: just add and divide.
- It’s intuitive — it’s the number “in the middle”, pulled up by large values and brought down by smaller ones.

Cons:

- The average can be skewed by outliers — it doesn’t deal well with wildly varying samples. The average of 100, 200 and -300 is 0, which is misleading.

The arithmetic mean works great 80% of the time; many quantities are added together. Unfortunately, there’s always those 20% of situations where the average doesn’t quite fit.

## Median

The median is “the item in the middle”. But doesn’t the average (arithmetic mean) imply the same thing? What gives?

Humor me for a second: what’s the “middle” of these numbers?

- 1, 2, 3, 4, 100

Well, 3 is the middle of the list. And although the average (22) is somewhere in the “middle”, 22 doesn’t really represent the distribution. We’re more likely to get a number closer to 3 than to 22. The average has been pulled up by 100, an outlier.

The median solves this problem by taking the **number in the middle of a sorted list**. If there’s two middle numbers (even number of items), just take their average. Outliers like 100 only tug the median along one item in the sorted list, instead of making a drastic change: the median of 1 2 3 4 is 2.5.

Pros:

- Handles outliers well — often the most accurate representation of a group
- Splits data into two groups, each with the same number of items

Cons:

- Can be harder to calculate: you need to sort the list first
- Not as well-known; when you say “median”, people may think you mean “average”

Some jokes run along the lines of “Half of all drivers are below average. Scary, isn’t it?”. But really, in your head, you know they should be saying “half of all drivers are below *median*“.

Figures like housing prices and incomes are often given in terms of the median, since we want an idea of **the middle of the pack**. Bill Gates earning a few billion extra one year might bump up the average income, but it isn’t relevant to how a regular person’s wage changed. We aren’t interested in “adding” incomes or house prices together — we just want to find the middle one.

Again, the type of average to use depends on how the data is used.

## Mode

The mode sounds strange, but it just means **take a vote**. And sometimes a vote, not a calculation, is the best way to **get a representative sample** of what people want.

Let’s say you’re throwing a party and need to pick a day (1 is Monday and 7 is Sunday). The “best” day would be the option that satisfies the most people: an average may not make sense. (*“Bob likes Friday and Alice likes Sunday? Saturday it is!”*).

Similarly, colors, movie preferences and much more can be measured with numbers. But again, the ideal choice may be the mode, not the average: the “average” color or “average” movie could be… unsatisfactory (Rambo meets Pride and Prejudice).

Pros:

- Works well for exclusive voting situations (this choice or that one; no compromise)
- Gives a choice that the most people wanted (whereas the average can give a choice that nobody wanted).
- Simple to understand

Cons:

- Requires more effort to compute (have to tally up the votes)
- “Winner takes all” — there’s no middle path

The term “mode” isn’t that common, but now you know what button to look for when playing around with your favorite statistics program.

## Geometric Mean

The “average item” depends on how we use our existing elements. Most of the time, items are added together and the arithmetic mean works fine. But sometimes we need to do more. When dealing with investments, area and volume, we don’t add factors, we multiply them.

Let’s try an example. Which portfolio do you prefer, i.e. which has a better **typical year**?

- Portfolio A: +10%, -10%, +10%, -10%
- Portfolio B: +30%, -30%, +30%, -30%

They look pretty similar. Our everyday average (arithmetic mean) tells us they’re both rollercoasters, but should average out to zero profit or loss. And maybe B is better because it seems to gain more in the good years. Right?

**Wrongo!** Talk like that will get you burned on the stock market: investment returns are multiplied, not added! We can’t be all willy-nilly and use the arithmetic mean — we need to find the actual rate of return:

- Portfolio A:
- Return: 1.1 * .9 * 1.1 * .9 = .98 (2% loss)
- Year-over-year average: (.98)^(1/4) = 0.5% loss per year (this happens to be about 2%/4 because the numbers are small).

- Portfolio B:
- 1.3 * .7 * 1.3 * .7 = .83 (17% loss)
- Year-over-year average: (.83)^(1/4) = 4.6% loss per year.

A 2% vs 17% loss? That’s a huge difference! I’d stay away from both portfolios, but would choose A if forced. We can’t just add and divide the returns — that’s not how exponential growth works.

Some more examples:

**Inflation rates:**You have inflation of 1%, 2%, and 10%. What was the average inflation during that time? (1.01 * 1.02 * 1.10)^(1/3) = 4.3%**Coupons:**You have coupons for 50%, 25% and 35% off. Assuming you can use them all, what’s the average discount? (i.e. What coupon could be used 3 times?). (.5 * .75 * .65)^(1/3) = 37.5%. Think of coupons as a “negative” return — for the store, anyway.**Area**: You have a plot of land 40 × 60 yards. What’s the “average” side — i.e., how large would the corresponding square be? (40 * 60)^(0.5) = 49 yards.**Volume**: You’ve got a shipping box 12 × 24 × 48 inches. What’s the “average” size, i.e. how large would the corresponding cube be? (12 * 24 * 48)^(1/3) = 24 inches.

I’m sure you can find many more examples: **the geometric mean finds the “typical element” when items are multiplied together.** I had wondered for a long time why the geometric mean was useful — now we know.

## Harmonic Mean

The harmonic mean is more difficult to visualize, but is still useful. (By the way, “harmonics” refer to numbers like 1/2, 1/3 — 1 over anything, really.) The harmonic mean helps us calculate **average rates** when several items are working together. Let’s take a look.

If I have a rate of 30 mph, it means I get some result (going 30 miles) for every input (driving 1 hour). When averaging the impact of multiple rates (X & Y), you need to think about outputs and inputs, not the raw numbers.

**average rate = total output/total input**

If we put both X and Y on a project, each doing the same amount of work, what is the average rate? Suppose X is 30 mph and Y is 60 mph. If we have them do similar tasks (drive a mile), the reasoning is:

- X takes 1/X time (1 mile = 1/30 hour)
- Y takes 1/Y time (1 mile = 1/60 hour)

Combining inputs and ouputs we get:

- Total output: 2 miles (X and Y each contribute “1″)
- Total input: 1/X + 1/Y (each takes a different amount of time; imagine a relay race)

And the average rate, output/input, is:

If we had 3 items in the mix (X, Y and Z) the average rate would be:

It’s nice to have this shortcut instead of doing the algebra each time — even finding the average of 5 rates isn’t so bad. With our example, we went to work at 30mph and came back at 60mph. To find the average speed, we just use the formula.

But don’t we need to know how far work is? Nope! No matter how long the route is, X and Y have the same output; that is, we go R miles at speed X, and another R miles at speed Y. The average speed is the same as going 1 mile at speed X and 1 mile at speed Y:

It makes sense for the average to be skewed towards the slower speed (closer to 30 than 60). After all, we spend twice as much time going 30mph than 60mph: if work is 60 miles away, it’s 2 hours there and 1 hour back.

**Key idea:** The harmonic mean is used when two rates contribute to the same workload. Each rate is in a **relay race** and contributing the same amount to the output. For example, we’re doing a round trip to work and back. Half the result (distance traveled) is from the first rate (30mph), and the other half is from the second rate (60mph).

**The gotcha:** Remember that the average is **a single element that replaces every element**. In our example, we drive 40mph on the way there (instead of 30) and drive 40 mph on the way back (instead of 60). It’s important to remember that we need to replace each “stage” with the average rate.

A few examples:

**Data transmission:**We’re sending data between a client and server. The client sends data at 10 gigabytes/dollar, and the server receives at 20 gigabytes/dollar. What’s the average cost? Well, we average 2 / (1/10 + 1/20) = 13.3 gigabytes/dollar*for each part*. That is, we could swap the client & server for two machines that cost 13.3 gb/dollar. Because data is both sent and received (each part doing “half the job”), our true rate is 13.3 / 2 = 6.65 gb/dollar.**Machine productivity**: We’ve got a machine that needs to prep and finish parts. When prepping, it runs at 25 widgets/hour. When finishing, it runs at 10 widgets/hour. What’s the overall rate? Well, it averages 2 / (1/25 + 1/10) = 14.28 widgets/hour*for each stage*. That is, the existing times could be replaced with two phases running at 14.28 widgets/hour for the same effect. Since a part goes through both phases, the machine completes 14.28/2 = 7.14 widgets/hour.**Buying stocks**. Suppose you buy $1000 worth of stocks each month, no matter the price (dollar cost averaging). You pay $25/share in Jan, $30/share in Feb, and $35/share in March. What was the average price paid? It is 3 / (1/25 + 1/30 + 1/35) = $29.43 (since you bought more at the lower price, and less at the more expensive one). And you have $3000 / 29.43 = 101.94 shares. The “workload” is a bit abstract — it’s turning dollars into shares. Some months use more dollars to buy a share than others, and in this case a high rate is bad.

Again, the harmonic mean helps measure **rates working together on the same result**.

## Yikes, that was tricky

The harmonic mean *is* tricky: if you have **separate** machines running at 10 parts/hour and 20 parts/hour, then your average really is 15 parts/hour since each machine is independent and you are **adding the capabilities**. In that case, the arithmetic mean works just fine.

Sometimes it’s good to double-check to make sure the math works out. In the machine example, we claim to produce 7.14 widgets/hour. Ok, how long would it take to make 7.14 widgets?

- Prepping: 7.14 / 25 = .29 hours
- Finishing: 7.14 / 10 = .71 hours

And yes, .29 + .71 = 1, so the numbers work out: it does take 1 hour to make 7.14 widgets. When in doubt, try running a few examples to make sure your average rate really is what you calculated.

## Conclusion

Even a simple idea like the average has many uses — there are more uses we haven’t covered (center of gravity, weighted averages, expected value). The key point is this:

- The “average item” can be seen as the item that could replace all the others
- The type of average depends on how existing items are used (Added? Multiplied? Used as rates? Used as exclusive choices?)

It surprised me how useful and varied the different types of averages were for analyzing data. Happy math.

Awesome post!!

When I was in school I crammed these formulas to pass the exam because no teacher would satisfy my curiosity behind the why, what & how of it. My math teacher would try to explain it to me but his jargon was most of the time out of comprehension.. My dad later helped me grasp the stuff.

But I must say you did a superb job of putting the concept in a super easy language and indeed there is no one else could have better explained!

I wish there were more teachers like you.. God bless you!

“Let’s say you weigh 160 lbs, and are in an elevator with a 100lb kid and 350lb walrus. What’s the average weight?”

“In this case, we’d swap in three people weighing 200 lbs each [(150 + 100 + 350)/3], and nobody would be the wiser.”

In the first part I quoted, you put the wrong number in. Just thought I’d be nitpicky.

That’s an interesting way to think about the average; I guess I always knew about that, but I’d never explicitly thought about the average being “replace everything with identical things”.

I love your articles; always make me think about things in a different way.

@Prateek: Wow, thanks for the kind words! Glad you are finding the site useful. I’m glad your dad was able to help you out — sometimes you just need to get things from a different viewpoint.

Unfortunately, math is one of those subjects where topics get one (and only one) explanation, and you’re off to the next one.

@Zac: Happy you’re enjoying the site — I fixed up the typo [I had actually put in my own weight instead of the hypothetical numbers which are easier to add up ].

Yeah, it’s amazing how many things we’ve “learned” in the past but haven’t seen from all angles (there’s a few other cool interpretations of the average but I didn’t want the post to be too long). Glad you’re enjoying the articles.

That my friend is one very well put together article! Thank you for the effort taken to show to us ‘simple’ people, how fun math can be, esp; statistics!

Now if you can just get the fudge heads at Oracle and Microsoft to introduce this into their ‘superior’ databases, we will all have much more straight forward lives indeed

10/10

This will come in handy for math homework, cause math is my absolute WORST subject. lol…..

This is absolutely brilliant! Knowledge is kind of like comedy. You’ve got to have delivery. If the delivery sucks the response will probably not be very good either. This my friend was fabulous! I would love to study under an individual such as yourself.

Quite delightful and informative. You do the world a service by contributing to a global increase in knowledge of mathematics.

Informative, Concise, what other everyday uses do you have for old school stat info? How about some information about how we can use calculus in everyday life?

@mrhassell: Thanks, glad you found it useful! The funny thing is I’m one of those simple people too — I want things to be simple and clear instead of rigorous and opaque. Unfortunately, I’m pretty powerless to influence the db designers :).

@kat: Cool, hope it comes in handy.

@Dave: Appreciate the comment, and I totally agree — any subject can be interesting if presented in the right way. I’ll keep cranking out the “aha!” moments as they happen :).

@Chris: Thanks, glad you found it useful. Yeah, one of the great things about blogging is that everyone can add a bit of information into the world.

@NebulousMaker: Thanks, a series on calculus is on the way. It’s a tricky subject to cover with real, everyday applications (i.e., non-physics), but they’re definitely out there.

A couple more things I noticed after a (second? third? fourth? millionth? I lost count) readthrough (sorry in advance if I’m too notpicky):

“The average is the value that can replace every existing item, and have the same result.” That doesn’t quite apply for the mode or median; for the rest it works, but the median and mode are different ideas entirely. They have their uses, yes, but they don’t fit under that definition of the average.

This might be nitpicky, but I don’t really see where this is analysing data using the average. It shows how to find the average and explains why that works, but it doesn’t really say anything about using that to analyze the data. To me, analyzing data has more to do with variance and the standard deviation, but that might just be me. Maybe I just don’t see it.

Also, here’s something else, possibly more useful for the average, though maybe a little too in depth for this article. The median works to eliminate outliers, but it’s more effective (though more time consuming) to find the mean AFTER eliminating the outliers. Gives a better indicator of what the average really is.

Just a few thoughts on the article and possibly something to put in another article (if I’m too nitpicky, just tell me).

Hi Zac, no worries at all, I like hearing what works and what doesn’t for each article — it’s a good way to improve.

Yeah, on second thought the title may be misleading — it’s more about “understanding the types of averages, with examples” vs “data analysis”, which probably deserves its own follow-up article. The idea of throwing away the outliers is a good one, and helps clean up data that may otherwise be skewed.

I had a similar inkling about the term “average” being applied to the median and mode — hopefully it’s clear that those items aren’t really “averages” so a replacement doesn’t work. But I’ll think about ways to clarify the sentence.

Excellent post. I tried doing a similar explanation a few months ago, so I have a real appreciation for your(better)work.

Hi MS, thanks for the comment, glad you enjoyed it. It can be a tricky concept to get across, but I’d still encourage you to trackback / release your explanation also — everyone has different insights :).

To Zac’s point, I just realized the median and mode behave “as if all items had the same value”, in a way.

When choosing the mode (most popular), you are acting as if every value was the mode — it’s the only one that matters. The median is similar: you choose a middle value, and the median doesn’t change if you had replaced every value with it.

Oh.. I can understand! yawn

whts the answer anyway:

Quick quiz: You drove to work at 30 mph, and drove back at 60 mph. What was your average speed?

Yours

Poor-in-Maths

Hi Rakesh, just check out the section on the harmonic mean.

Dear Kalid, great post once again although I what I got confused into was why do we multiply to get average return of a portfolio over four years? Why dont we add? Similarly what is the logic behind multiplying the diferent rates of inflation across the three periods and not adding them? Can you please clarify a bit on the geometric mean?

Regards (And also looking forward to more of your math posts)

Hi Mohammad, great question. Most of the time, interest is “compounded” which means you multiply returns over time [there is a type of simple interest that is added, and you'd use the regular arithmetic mean for that -- this type is very rare].

When working with interest rates, +10% looks like addition but it really means multiplying by 1.1 — for example, gaining 10% return on 100 is 110, and gaining 10% return on 200 is 220.

If you have 10% return again, you’d do 110 * 1.1 = 121, or 220 * 1.1 = 242.

Simple addition doesn’t work because we are scaling the amount of “stuff” we have. Check out the simple and compound interest rates for more about interest returns. Hope this helps!

Hi Kalid,

what tool do you use to create such a wonderful graphics?

Thank you.

Hi Denis, I use PowerPoint 2007 to make the diagrams.

The ‘identical numbers’ thing reminded me of a nearly aha moment.

I’m more a wordy than numeric person, so I wished math was taught like this:

The average of a set of numbers is the number that all the numbers would be if the sum of the set of numbers was the same number, but all the numbers were the same number, and there was the same number of numbers.

Re Zac’s point, medians and outliers: quite right I think – and thanks for remembering us outliers.

I seem to be a statistical outlier on all sorts of demographics.

Makes me wonder about some people (e.g. politicians, journalists, marketing managers, medical policymakers etc.) and whether they are just using ‘blunt’ averages (i.e. =broad generalities) when they do their correlations – and then make policy and predictions telling us what’s good for us and what’s bad for us etc.

Always makes me think: “that’s very interesting but I have no idea whether it applies to me.”

@The Hermit: Thanks for the comment. Yes, we all have different ways of seeing the same topic — I love thinking about the different interpretations :).

Hello!!

I love maths bcoz i teach maths.

Thanks.

Jitendra

WoW….very very very interesting…pls continue your wonderful work..one of the excellent article i came across….I also thank you for the valuable information….short,sweet and crispy..

regards.

Sampath

its so hard…..i can’t understand everything….

Wow, wish I had a teacher like you when I was in school

But there is one example I’m not sure I understand :

For the harmonic mean, you give the example of data transmission. At the end you divide the cost by two because each one “do half the job”. I think it would be the opposite, since if we want to transfer 1gb, you have to send it AND received it. Wouldn’t that be doubling the amount of working instead of halving it ?

I would be glad if you could clarify this particular example

Thanks !

Hi David, great question. In this case cost was written as “gigabytes per dollar”, so a higher number is better (like miles per gallon).

Division, in this case, would lower the number (indicating an *increase* in cost).

If we had written the price in terms of dollars/gigabyte, you are correct that we would multiply and double the amount.

Hope this helps!

Can you tell me how to read Perado chart? What are we looking at? Thanks

“After all, we spend twice as much time going 30mph than 60mph: if work is 60 miles away, it’s 2 hours there and 1 hour back.”

In this example it could also be calculated using the arithmetic mean, averaging the speed for 1st, 2nd, and 3rd hour:

(30+30+60)/3 = (60 + 60)/3 = 40

Or in general for rates X1,X2,…,Xn where X1

…where X1

I can’t write ‘less than’ in comment. 8)

Where Xn = max(X1,…, Xn) then

n * Xn

——————

Xn/X1 + … Xn/Xn

When we divide numerator and denominator with Xn we get the harmonic mean formula. So its the same thing i guess.

Hi Vindl, thanks for the comment! That’s a neat way to link the arithmetic and harmonic mean formulas, thanks.

Please delete the 15th comment. my email id is published there actually.

Hi Rakesh, no problem — I removed it from the comment.

I don’t understand the geometric mean, especially concerning the widget factory.

Let’s use another example. Say a machine preps widgets at 5 widgets/hour and a second machine finishes the widgets at 1 widget/hour. Wouldn’t the rate at which widgets are outputted 1 widget per hour?

After waiting for the very first widget to be prepped, the “finishing” machine essentially has an unlimited amount of “prepped” widgets to work with since they’re bunching up, waiting to be finished up. So the rate at which the “finishing” machine processes the widget (1 widget/hour) ends up being the rate at which widgets are leaving the whole system.

Your harmonic mean method suggests that if the two machines were replaced by two machines with the identical rates of 6/5 widgets per hour (2/((1/5)+(1/1))), the system output would be the same!

So what am I missing?

Regarding my last post, the geometric mean would actually suggest a rate of 5/3 widgets per hour, not 6/5.

But this is still greater than 1 widget per hour, so my question stands.

@G: Great question, I had to think a bit to make it clear in my head. Let’s think about the production rates without using the geometric mean.

We can prep 5 widgets/hour and finish 1 widget per hour. How long does it takes to do a single widget?

Well, we spend 1/5 (12 minutes) on prepping and a full hour on finishing.

So, we can complete 5/6 of a widget per hour through the entire system. Even though the last stage can finish a widget in 1 hour, we spent 12 minutes getting to that stage. No single widget ever takes less than an hour. [Think of the machine as a black box -- you drop in a widget, and it drops out 1:12 later. We don't know that it really has two separate stages].

Now, suppose we wanted to replace this black box (with two stages) with parts that operated at the same exact speed. That is, we want to finish 5 widgets in 6 hours. How fast should each part go?

Well, they need to move identically. That means they each only have 3 hours (half of 6 hours) to do their work. So, each part must operate at 5 widgets in 3 hours, and then pass it to the next one.

This new black box, with 2 parts operating at 5 widgets every 3 hours (5/3 widgets/hour) would look exactly the same as our original.

The geometric mean gives the same result: (2/(1/5 + 1/1)) = 5/3 widgets/hour.

Since we must put widgets through both stages, the overall rate is 5/6 widgets/hour (half of that).

Hope this helps!

Thank you for your answer.

I read and understood your post. However I don’t think your replacement parts would be appropriate replacements for a system of constant widget production.

While your replacement parts do simulate the black box perfectly with the example of one widget constructed, a situation of constant widget construction would have the rate of construction approximate 1 widget per hour the longer the system runs.

The very first widget involves the “finishing” machine waiting for 12 minutes before doing its work. Then the “finishing” machine does its work and 1h 12m later from the input into the system, out comes a widget. However, that finishing machine then IMMEDIATELY gets started on the next widget. And 1 hour (not 1h 12m) after the first widget is outputted, the second is outputted!

The limit of the rate of construction for the system as time approaches infinity is 1 widget per hour (that 12 minute wait at the very beginning will always prevent the overall rate from ever reaching 1 widget per hour)

The reason I bring this complication up is because I still don’t understand the value of a harmonic mean in a system involving components that depend upon each other for their rates. And most systems do have such components.

Thanks again!

@G: Great question — I really had to think about the two situations because viewpoints “make sense”.

The harmonic mean finds the average rate to produce one widget (1:12). That is, every single widget through the system spends 1:12 on the assembly line: 12 minutes on the first part, 1 hour on the second.

However, the harmonic mean only models 1 widget at a time. It isn’t complex enough to model optimizations like pipelining, where you push widgets through the first part even while the second is still working (after all, how long is “long enough” to take the amortized analysis? :-)).

The benefit is that you can compare similar production lines without having to make assumptions about whether the pipelines are full, the impact, of gaps, etc. So you can (quickly) compare a production line with 5 wph (widgets per hour) prepping & 1 wph finishing, with an alternative with 3 wph prepping & 2 wph finishing.

Another subtlety is that people don’t care which widget they get; widget A = widget B. But suppose this was a carwash instead: 5 cars per hour washing, and 1 car per hour polishing. You drop off your car — how long to get it done?

Clearly it’s 1:12. You don’t want just *any* car from the line (i.e. the next one the polisher spits out), you want the one you put in.

So, the harmonic mean takes a simple scenario where pipelining / amortized analysis isn’t involved, and assumes the widget you put in is the one you need to get out. Speaking of bottlenecks, “The Goal” is a pretty interesting look at production lines, etc. I think modeling the net output of such lines requires much more sophisticated analysis (pipeline stalls, etc.). It is true that a system, over time, will approach the speed of the slowest-moving part (1 wph). It is interesting to see that the speed of the entire system (1 wph) need not be the same as the time needed to process a single widget, due to pipelining. (Computers operate a similar way, and can complete 1 instruction per cycle even if a single instruction takes several cycles to complete — the last stage is always pumping one out.).

Hope this helps!

-Kalid

Helped a lot, thanks!

@G: You’re welcome, thanks for the interesting discussion.

great post Kalid, few thoughts from my side …

I feel Harmonic Mean is not a right measure in your widget example or even for that matter even in the car wash example.

“But don’t we need to know how far work is? Nope! No matter how long the route is, X and Y have the same output;”

If I understood this correctly, Harmonic Mean is used to compare two different rates giving similar output, in widget or car wash examples we are talking about two different outputs at different rates aint it…

If you drive one mile at 30 mph and one mile at 40 mph what is your average speed for the two miles?

Please help me understand this question and the answer. Thanks Denise

Wow!

You took me out of the rain!

Your explanations helped me a lot. Thanks!

Best example I’ve ever seen of where average doesn’t work out: “The average net worth of Bill Gates and ten homeless guys is over a billion dollars.”

On harmonic means: They used to give us puzzles in school like “The cold water tap can fill the tub in 20 minutes. The hot water tap can fill the tub in 40 minutes. The drain can empty the tub in 30 minutes. Both taps are turned on but the drain is accidentally left open. How long will it take to fill the bath?”

Those puzzles drove me up a tree because although I could memorize the formula, it still made no intuitive sense.

Finally I figured it out: Your intuition says that the values for the taps and the drain should simply add and subtract. Your intuition is right, but the question is add and subtract *what*?

Just adding times doesn’t make any sense, because that says the cold and hot water taps working together should add up to an hour, but we know that using them together would make the time shorter, not longer.

The answer is that it’s RATES that are adding and subtracting.

So the cold water tap fills at the rate of 3 bathtubs per hour. The hot tap runs at 1.5 bathtubs/hour and the drain runs at 2 bathtubs/hour.

So working together, they produce 3 + 1.5 – 2 = 2.5 bathtubs/hour. 2.5 bathtubs/hour is 0.4 hours/bathtub. 0.4 hours is 24 minutes. There’s your answer.

Do the algebra, and you get the same 1/(1/a+1/b-1/c) formula as above.

@Ed: Great example on when to use the average vs the median :).

Also I like the tub example too — we want to add *something*, but times aren’t it! Rates are what we want, exactly for the reason you say — it’s a good example showing why we need to invert the times to get the rate.

Nice article! Could you give your view on expected value and why do people use it for probability distribution?

@franz: Thanks, glad you liked it. I think expected value would be a great follow-up article.

In my mind, I see expected value as a form of weighted average, where the more-likely scenarios have more impact than the least-likely ones. When you don’t have clear-cut outcomes, you have to take the probability into account as best you can. Definitely a good idea for a follow-up.

this was fantastic! a definite thumbs up on stumble.

For the life of me I cannot understand how you get 7.14 widgets/hour with the Machine productivity model when the slowest the machine runs is 10 widgets/hour.

My intuition will just not let me get that.

I understand how you got 14.28 widgets/hour as the harmonic mean just as I understand how 60mph one way and 30mph back gets you 40mph average speed.

Why wouldn

@minsoo: Thanks!

@Anonymous: Great question. The core issue is separating the time needed to process an individual widget vs. how fast “things” are coming out of the pipeline.

Let’s say you can wash 2 pants/hour, and can dry 4 pants/hour. How long does it take to wash 1 pair of pants?

Well, there’s 1/2 hour for washing, and 1/4 hour for drying for 3/4 hour (45 mins). Even though the slowest machine runs at 30 mins, it takes 45 for the entire cycle (for one item).

Now, once you have a bunch of backlog, pants may be leaving the dryer every 15 mins. But each of those pants had an additional 30-minute wash cycle at some point. These means help you figure out how fast a process is for 1 item (without pipelining speedups).

math great

Hi Khalid.You did a very nice work! Thank you!

I had difficulties grasping the harmonic average from the car speed example and then I think I might have found a somewhat simpler algorithm to calculate it.The point is that you may not know always which is the supposed output or input.So here’s my algorithm:

1.Detect the requested item!

(for instance:

Question: what is requested?

Answer: An(average)speed :AS)

2.GET SF! (the SPEED formula):

SPEED=DISTANCE/TIME

3.Get ASF! (average speed formula:

average speed(AS) = TOTAL distance/TOTAL time).

4.Refine ASF in order to get RASF(refined ASF)!

AS=(distance1+distance2)/(time1+time2)

5.TRACE all the components of the RASF!

(We need to DETECT 4 items: 2 distances and 2 times,for being able to calculate AS.

Here’s the proceed:

We don’t know the 2 distances exactly so we use letters instead of numbers:

Distance 1 is D (from home to work place)_

Distance 2 is D (from work place to home)

Getting time 1(rearrange SF in order to get the TF! (time formula))So if SPEED=DISTANCE/TIME then TIME=DISTANCE/SPEED.

time 1 is then D/60 (60(miles per hour)is the first speed)

time 2=D/30 (30 is the second speed)

6.Do the calculations

AS= 2D/(D/60+D/30) You get AS=40(miles per hour).

Sorry for my previous expressing mistakes.I meant not “harmonic average but harmonic mean”(I’m romanian so my english is not so good).Also I enumerated the speeds backwards (60-30) instead of (30-60).

It seems to me that the mentioned example is not about whether one uses arithmetic mean or harmonic mean but about sticking to the speed formula in order not to get wrong results.

Sorry for my previous expressing mistakes.I meant not “harmonic average but harmonic mean”(I’m romanian so my english is not so good).Also I enumerated the speeds backwards (60-30) instead of (30-60).

It seems to me that the mentioned example is not about whether one uses the arithmetic mean or the harmonic mean but about sticking to the speed formula in order not to get wrong results.

I puzzled over this line:

“Year-over-year average: (.83)^(1/4) = 4.6% loss per year.”

I kept getting 0.955 and wondering what I didn’t understand. Finally it occurred to me that the line should read “Year-over-year average: 100*(1-(.83)^(1/4) = 4.6% loss per year.”

Amazing article. Wonderful insight here, and not just for beginners =)

this iz great thing 2 help kids in math so keep it up let the kids learn more things about mean and mode most imp….

A boat takes t1 time to travel a certain distance when it travels in the direction of flow of water, and takes t2 time to travel the same distance against the stream. How much time will it take to travel the same distance in still water ?

Is it a case of Harmonic Mean ?

Please Explain, how ?

In problems involving tanks and taps or problems such as follows I find solving the problem without reciprocals easier.

Ex:If 12 men and 16 boys can do a piece of work in 5 days; 13men and 24 boys can do it in 4 days, then blah….

Soln:From the first condition 1 job=12*5 mandays+16*5 boydays

From the second condition 1 job=13*4 mandays+24*4 boydays.

Equating the two we get 60mandays+80boydays=52mandays+96boydays.

So, 1manday=2boydays.

We can solve any question from here onward.

====

Ex:If a tap takes 12 hours to fill a tank and another tap takes 10 hrs to fill the same tank how long will both the taps together take to fill the tank?

Multiply12*10=120 ;

In 120hrs first tap fills 10 tanks and second tap fills 12 tanks, a total of 22 tanks.

So both taps together take 120/22 hrs.

The data transmission is incorrect for GB/$.

If you’re only paying one side it’s exactly the amount you pay there’s no average to be had.

But if you’re paying both client and server, it’s not each doing a half the work, it’s each doing a whole part each. If you’re paying for a client and server and you get 10GB/$ on both of them, when you transmit 10GB from the client to the server, both have consumed 10GB worth of transmitted data, you pay 2 dollars. So in effect two 10GB/$ machines averages to 5GB/$.

That’s half the estimate if both had 10GB/$, which would be 2/(1/10+1/10)=10GB/$, which is clearly wrong.

A 20GB/$ server and a 10GB/$ client transmitting the data would average 6.66GB/$ since you need to set the same ratio for GB.

Take this to the extreme with 1000GB/$ or 10000000GB/$ and and 10GB/$ and the cost should move towards being significantly towards 10GB/$ because the servers cost is almost nothing.

Just amazing. such a great ideas u shared with us. thank u very much.

Just wanted to add my voice to those thanking you for this article,. A marvelous explanation.

@Craig: Thank you!

I’d notice if someone swapped a walrus for a person.

how to take average of probability values

Hi! You really did a great job on this. I really liked it. I just want to know how did you get the return of investments ? thanks

After giving this serious consideration, there are four things I now realize which seem fundamental the correct application of the harmonic mean.

1) Any of the basic arithmetic operations (+ – / *) that are applied to a fraction can be thought of as being applied to the numerators of those fractions, while the denominators stay constant.

For example, the arithmetic mean of 1/4 and 3/4 is 2/4, in which the numerators are “averaged” while the denominators are constant. In this sense, adding, subtracting, multiplying and even dividing effect the numerator while leaving the denominator constant. As another example 3/4 divided by 2 could be thought of as (3/2)/4, exemplifieing the direct impact of the division function on the numerator.

2) The harmonic mean is the arithmetic mean of the denominators. For example the harmonic mean of 1/3 and 1/5 is 1/4.

3) “Miles per hour” and “hours per mile” refer to the same physical abstraction, namely speed. The difference is that the arithmetic operates on each differently. Doubling the “miles per hour” doubles speed, while doubling the “hours per mile” halves it. In the same way, “widgets per hour” refers to the same abstraction as “hours per widget”.

4) So the question is, what quantity do I want to operate on? In considering between the harmonic mean and the arithmetic mean, what quantities do I want to average? If they are the numerators, the correct choice is the arithmetic mean. If they are the denominators, the correct choice is the harmonic mean.

As an example, if I want the average of the hours spent in traveling a mile and I am given a list of “miles per hour” units, then I would use the harmonic mean which would calculate an average of the hours using “hours per mile” units and invert the result back into the “miles per hour” unit. If I was given a list of “hours per mile” units, I would just use the arithmetic mean to make the same calculation.

Hi Adam, that’s an awesome analogy. I hadn’t thought of that numerator vs. denominator difference, but that’s exactly it. Normally we measure “output per input” i.e. (miles per gallon) or “dollars per hour”, and we don’t really think of averaging the “input” side of things. I like it.

What is the function of the carat (^) in these examples?

If Portfolio A is 0.5% loss per year, wouldn’t Portfolio B be 4.25% rather than 4.6% loss per year? 17%/4

Portfolio A:

Return: 1.1 * .9 * 1.1 * .9 = .98 (2% loss)

Year-over-year average: (.98)^(1/4) = 0.5% loss per year (this happens to be about 2%/4 because the numbers are small).

Portfolio B:

1.3 * .7 * 1.3 * .7 = .83 (17% loss)

Year-over-year average: (.83)^(1/4) = 4.6% loss per year.

I found it all, from 4 “Easy Permutations and Combinations” to 7 “How To Analyze Data Using the Average” extremely interesting and presented in a very clear way, so we could not only understand it better but be able to connect to other areas of the Statistic and Probability.

It is amazing presentation

Sincerely

Luis H. Alvarez

@Rick: The carat represents an exponent, so 2^3 = 8.

When measuring the “average return per year over 4 years”, we can’t just do a division(17/4) because the effects compound. 17/4 = 4.25, but (1 – .0425)^4 = .84 (i.e., 4.25% applied 4 times results in an 16% loss, not the 17% we were expecting).

@Luis: Glad you liked it!

Hi there just wanted to give you a quick heads up.

The text in your post seem to be running off the screen in

Chrome. I’m not sure if this is a format issue or something to do with internet browser compatibility but I thought I’d

post to let you know. The layout look great though!

Hope you get the issue fixed soon. Cheers

Having read this I thought it was very enlightening. I appreciate you taking the time and energy to put this informative article together.

I once again find myself spending a significant amount of time both reading

and commenting. But so what, it was still worthwhile!

Hi, the whole thing is going sound here and ofcourse every

one is sharing facts, that’s really good, keep up writing.

Hey there! Do you use Twitter? I’d like to follow you if that would be ok. I’m definitely enjoying your

blog and look forward to new updates.

pros and cons of range?

Loved the interchange between Kalid and G in posts 35 to 41!

It’s nice to see how much good can come from disagreement when the discussion is focused on the issue at hand, not how ‘wrong’ the other person is.

I just finished a post in the statistics section and found this quite interesting. My first thought was: ‘You’re both giving different answers, and you’re both right, because you were answering different questions.’

To me the differences between 2 machines in series and 2 machines in parallel was obvious because I’m accustomed to looking for it. In the verbiage, however, it seems that other details were in the forefront and made the two scenarios seem the same. The distinction between pipelining and series data transfer, or simultaneous production vs. ‘one to completion’ then begin the second, became apparent after a bit of back and forth. It’s more evidence that even the correct application of mathematics is useless without good understanding of just what mathematics applies to our situation and why.

From my post in the Statistics section I’ll only repeat my Rule #1 for stats:

Your results don’t mean anything until and unless they mean something.

Great post. Clearer than most.

Wondering if you can help me clarify something that I don’t seem to be able to classify:

The question came up recently about an average that I found use for a long time ago, but it’s been so long I can’t remember why. Boring details aside, it worked like this:

1. Take sample

2. Is this the first sample?

a. Yes? Set it as the average. Get next sample.

b. No? Add it to the old average and divide by 2. Get next sample.

What would this be classified as? It provides a much moodier output than the standard mean algo.

@Steve

I’ve never seen that particular algorithm. To the best of my knowledge it is not a standard method used in mathematics for any measure of central tendency.

It may, however, have some use in some scenario; without further details of where you saw it I couldn’t say. I can tell you what it will give you if you use it and why it appears ‘moody’ as you call it.

First just looking at it for what it is, it provides something like an average for a rolling data set. That means if you are watching something that is changing, growing, progressing through time, or in any other way continuing to spit out numbers at you, and want to know the average, you can use this method to retain one number (and no other information about what has come before) to get a sense of the average to compare to your new piece of data. This ‘Steve average’ just places great emphasis on the one new piece of data and tends to downplay all older data equally, even the one previous piece of data.

Here’s a quick breakdown of the Steve Average (SA):

For 2 elements: each contributes half of its value toward the SA

For 3 elements: final element contributes 1/2, each previous contributes 1/4, all the previous elements together contribute 1/2

For 4 elements: final element 1/2, each previous 1/8, all previous, 3/8

For 5 elements: final element 1/2, each previous 1/16, all previous, 4/16

…

For 20 elements (# of students in a class?):

final element 1/2 contribution to SA

each previous contributes 0.0000001 of its value towards SA

all the first 19 together contribute 0.000019 of their combined value towards SA

Conclusion:

As you watch the Steve Average while new numbers pour in, the SA is nothing more than just half of whatever value was the most recent with almost no regard to any other data,even the one most recent data point. This conclusion gets stronger as the number of elements grows but with as few as 7 elements the final element makes up over %90 of the entire contribution to the SA. The SA has the benefit that it is easy to calculate and only requires keeping memory of one number for all previous data.

If that is the behavior you’re looking for then it is useful. If you agree with me that this behavior doesn’t help provide any meaningful analysis of the real world then it is not useful.

Please don’t ever be afraid to ask a stupid question, for the only stupid question is the one you don’t ask. And do not discount a theory or method of analysis until you have looked at it for what it really is, and what benefit or behavior it provides. Only then can you know if it is useful.

By the way, there does exist another method for keeping track of a rolling average that is mathematically equal to the arithmetic mean. It is almost as easy to calculate as the SA and it only requires memory of 2 numbers for all previous data.

- keep track of running average and n= number of elements

-when new number comes in do the following

-multiply old ave by old n

-add new data

-divide by new n

Excelsior,

Eric V

@Eric V

Ok, cool thanks. Smoothing rings a bell, basically a filter. I’m no mathematician hehe but when intrigued, I tend to hunt things like this down. Thanks

The only way I can make sense of the harmonic mean is resistors in parallel. My way of explaining it is adding up areas (reciprocal of resistance) of water pipes.

Comment from a reader (Michael):

Hello Kalid,

I love your site and approach to maths. However, how do you get the numbers (rate of return) 1.1 * .9 * 1.1 * .9 = .98 (2% loss)) in regard to Portfolio A?

And

(1.3 * .7 * 1.3 * .7 = .83 (17% loss)) in re to Portfolio B?

I don’t understand how you derived those returns? Thank You, Michael DeYoung

*also, with +10 -10 +10 -10 shouln’t that = 0?

—–

Hi Michael, great question. I should probably clarify the post.

A 10% return is the same as turning $100 into $110, and we can model this by multiplying by 110/100 or 1.1

A -10% return is the same as turning $100 int $90, and we can model this by multiplying by 90/100 or .9.

If you had a portfolio that gained 10%, then lost 10%, it would be

Start: $100

Year 1 (gain 10%): $100 * 1.1 = $110

Year 2 (lose 10%): $110 * .9 = $99

Something interesting happened — when we lost 10%, we went from $110 down to $99, not back to $100! This is because losing 10% after you’ve grown is worse than a 10% on your original amount. So we end up lower than before.

If we continue that pattern:

Year 3 (gain 10%): $99 * 1.1 = $108.90

Year 4 (lose 10%): $108.90 * .9 = $98.01

The same thing happened: we gained 10% on $99, then we lost 10% on $108.90, and that loss took out a larger chunk since it happened on the larger amount.

So, the strange effect is that gaining and losing the same percentage will eventually bankrupt you :). It’s because your losses are happening on the larger amount.

We might think “Ok, what happens if we have the loss first?”

Start: $100

Year 1 (lose 10%): $100 * .9 = $90

Year 2 (gain 10%): $90 * 1.1 = $99

Uh oh. Even having the loss first doesn’t help. It means we lose a large amount (losing 10% on $100) but then only make back a smaller amount (gaining 10% on $90 is only $9, not $10, and doesn’t cancel out the loss.)

This seems unfair, right? Shouldn’t zig-zagging keep you even?

Thinking about it more, I realize there’s an unfair element here:

When the gains grow you, the losses get bigger. The gains are actually “helping” the future losses become even bigger.

Now, the silver lining is that losing 10%, then 10%, then 10% isn’t as bad as losing 30% all at once:

Start: $100

Year 1 (lose 10%): $100 * .9 = $90

Year 2 (lose 10%): $90 * .9 = $81

Year 3 (lose 10%): $81 * .9 = $72.90

We “only” lost 28%, instead of the expected 30%. That’s because the losses are shrinking the amount we can lose, so they work against themselves that way too.

Phew. Hope that helps clarify — some of this can be really unintuitive, and we have to go through the numbers vs. trying to estimate internally.