The unique shape of his head. Technically, his head is an oval, like yours. But somehow, making his jaw wider than the rest of his head is perfect.
The wide-eyed bewilderment. The whites of his eyes, the raised brows, the pursed lips – the cartoonist saw and amplified the emotion inside.
So, who really “gets it”? It seems the technical artist worries more about the shading of his eyes than the message they contain.
Numbers Began With Cartoons
Think about the first numbers, the tally system:
I, II, III, IIII …
Those are… drawings! Cartoons! Caricatures of an idea!
They capture the essence of “existing” or “having something” without the specifics of what it represents.
Og the Cavemen Accountant might have tried drawing individual stick figures, buffalos, trees, and so on. Eventually he might realize a shortcut: draw one buffalo symbol to show the type, then a line for each item. This captures the essence of “something is there” and our imaginations do the rest.
Math is an ongoing process of simplifying ideas to their cartoon essence. Even the beloved equals sign (=) started as a drawing of two identical lines, and now we can write “3 + 5 = 8″ instead of “three plus five is equal to eight”. Much better, right?
So let’s be cartoonists, seeing an idea — really capturing it — without getting trapped in technical mimicry. Perfect reproductions come in after we’ve seen the essence.
Technically Correct: The Worst Kind Of Correct
We agree that multiplication makes things bigger, right?
Ok. Pick your favorite number. Now, multiply it by a random number. What happens?
If that random number is negative, your number goes negative
If that random number is between 0 and 1, your number is destroyed or gets smaller
If that random number is greater than 1, your number will get larger
Hrm. It seems multiplication is more likely to reduce a number. Maybe we should teach kids “Multiplication generally reduces the original number.” It’ll save them from making mistakes later.
No! It’s a technically correct and real-life-ily horrible way to teach, and will confuse them more. If the technically correct behavior of multiplication is misleading, can you imagine what happens when we study the formal definitions of more advanced math?
There’s a fear that without every detail up front, people get the wrong impression. I’d argue people get the wrong impression because you provide every detail up front.
As George Box wrote, “All models are wrong, but some are useful.”
A knowingly-limited understanding (“Multiplication makes things bigger”) is the foothold to reach a more nuanced understanding. (“People generally multiply positive numbers greater than 1, so multiplication makes things larger. Let’s practice. Later, we’ll explore what happens if numbers are negative, or less than one.”)
I wrap my head around math concepts by reducing them to their simplified essence:
Imaginary numbers let us rotate numbers. Don’t start by defining i as the square root of -1. Show how if negative numbers represent a 180-degree rotation, imaginary numbers represent a 90-degree one.
The number e is a little machine that grows as fast as it can. Don’t start with some arcane technical definition based on limits. Show what happens when we compound interest with increasing frequency.
The Pythagorean Theorem explains how all shapes behave (not just triangles). Don’t whip out a geometric proof specific to triangles. See what circles, squares, and triangles have in common, and show that the idea works for any shape.
Euler’s Formula makes a circular path. Don’t start by analyzing sine and cosine. See how exponents and imaginary numbers create “continuous rotation”, i.e. a circle.
Avoid the trap of the guilty expert, pushed to describe every detail with photorealism. Be the cartoonist who seeks the exaggerated, oversimplified, and yet accurate truth of the idea.
PS. Here’s my cheatsheet full of “cartoonified” descriptions of math ideas.
Are we talking about the grower’s perspective, or an observer’s?
e and the natural log are from the grower’s instant-by-instant perspective
Base 10, Base 2, etc. are measurements convenient for a human observer
In my head, I put the options in a table:
and I have thoughts like “I need the cause, from the grower’s perspective… that’s the natural log.”. (Natural log is abbreviated with lowercase LN, from the high-falutin’ logarithmus naturalis.)
I was frustrated with classes that described the inner part of the table, the raw functions, without the captions that explained when to use them!
That won’t fly, let’s get direct practice thinking with logs and exponents.
Scenario: Describing GDP Growth
Here’s a typical example of growth:
From 2000 to 2010, the US GDP changed from 9.9 trillion to 14.4 trillion
Ok, sure, those numbers show change happened. But we probably want insight into the cause: What average annual growth rate would account for this change?
Immediately, my brain thinks “logarithms” because we’re working backwards from the growth to the rate that caused it. I start with a thought like this:
A good start, but let’s sharpen it up.
First, which logarithm should we use?
By default, I pick the natural logarithm. Most events end up being in terms of the grower (not observer), and I like “riding along” with the growing element to visualize what’s happening. (Radians are similar: they measure angles in terms of the mover.)
Next question: what change do we apply the logarithm to?
We’re really just interested in the ratio between start and finish: 9.9 trillion to 14.4 trillion in 10 years. This is the same growth rate as going from $9.90 to $14.40 in the same period.
We can sharpen our thought:
Ok, the cause was a rate of .374 or 37.4%. Are we done?
Not yet. Logarithms don’t know about how long a change took (we didn’t plug in 10 years, right?). They give us a rate as if all the change happened in a single time period.
The change could indeed be a single year of 37.4% continuous growth, or 2 years of 18.7% growth, or some other combination.
From the scenario, we know the change took 10 years, so the rate must have been:
From the viewpoint of instant, continuous growth, the US economy grew by 3.74% per year.
Are we done now? Not quite!
This continuous rate is from the grower’s perspective, as if we’re “riding along” with the economy as it changes. A banker probably cares about the human-friendly, year-over-year difference. We can figure this out by letting the continuous growth run for a year:
The year-over-year gain is 3.8%, slightly higher than the 3.74% instantaneous rate due to compounding. Here’s another way to put it:
From an instant-by-instant basis, a given part of the economy is growing by 3.74%, modeled by e.0374 · years
On a year-by-year basis, with compounding effects worked out, the economy grows by 3.81%, modeled by 1.0381years
In finance, we may want the year-over-year change which can be compared nicely with other trends. In science and engineering, we prefer modeling behavior on an instantaneous basis.
Scenario: Describing Natural Growth
I detest contrived examples like “Assume bacteria doubles every 24 hours, find its growth formula.”. Do bacteria colonies replicate on clean human intervals, and do we wait around for an exact doubling?
A better scenario: “Hey, I found some bacteria, waited an hour, and the lump grew from 2.3 grams to 2.32 grams. I’m going to lunch now. Figure out how much we’ll have when I’m back in 3 hours.”
Let’s model this. We’ll need a logarithm to find the growth rate, and then an exponent to project that growth forward. Like before, let’s keep everything in terms of the natural log to start.
The growth factor is:
That’s the rate for one hour, and the general model to project forward will be
If we start with 2.32 and grow for 3 hours we’ll have:
Just for fun, how long until the bacteria doubles? Imagine waiting for 1 to turn to 2:
We can mechanically take the natural log of both sides to “undo the exponent”, but let’s think intuitively.
If 2 is the final result, then ln(2) is the growth input that got us there (some rate × time). We know the rate was .0086, so the time to get to 2 would be:
The colony will double after ~80 hours. (Glad you didn’t stick around?)
What Does The Perspective Change Really Mean?
Figuring out whether you want the input (cause of growth) or output (result of growth) is pretty straightforward. But how do you visualize the grower’s perspective?
Imagine we have little workers who are building the final growth pattern (see the article on exponents):
If our growth rate is 100%, we’re telling our initial worker (Mr. Blue) to work steadily and create a 100% copy of himself by the end of the year. If we follow him day-by-day, we see he does finish a 100% copy of himself (Mr. Green) at the end of the year.
But… that worker he was building (Mr. Green) starts working as well. If Mr. Green first appears at the 6-month mark, he has a half-year to work (same annual rate as Mr. Blue) and he builds Mr. Red. Of course, Mr. Red ends up being half done, since Mr. Green only has 6 months.
What if Mr. Green showed up after 4 months? A month? A day? A second? If workers begin growing immediately, we get the instant-by-instant curve defined by ex:
The natural log gives a growth rate in terms of an individual worker’s perspective. We plug that rate into ex to find the final result, with all compounding included.
Using Other Bases
Switching to another type of logarithm (base 10, base 2, etc.) means we’re looking for some pattern in the overall growth, not what the individual worker is doing.
Each logarithm asks a question when seeing a change:
Log base e: What was the instantaneous rate followed by each worker?
Log base 2: How many doublings were required?
Log base 10: How many 10x-ings were required?
Here’s a scenario to analyze:
Over 30 years, the transistor counts on typical chips went from 1000 to 1 billion
How would you analyze this?
Microchips aren’t a single entity that grow smoothly over time. They’re separate editions, from competing companies, and indicate a general tech trend.
Since we’re not “riding along” with an expanding microchip, let’s use a scale made for human convenience. Doubling is easier to think about than 10x-ing.
With these assumptions we get:
The “cause of growth” was 20 doublings, which we know occurred over 30 years. This averages 2/3 doublings per year, or 1.5 years per doubling — a nice rule of thumb.
From the grower’s perspective, we’d compute ln(text(1 billion)/1000) / text(30 years) = 46% continuous growth (a bit harder to relate to in this scenario).
We can summarize our analysis in a table:
Learning is about finding the hidden captions behind a concept. When is it used? What point view does it bring to the problem?
My current interpretation is that exponents ask about cause vs. effect and grower vs. observer. But we’re never done; part of the fun is seeing how we can recaption old concepts.
Appendix: The Change Of Base Formula
Here’s how to think about switching bases. Assuming a 100% continuous growth rate,
ln(x) is the time to grow to x
ln(2) is the time to grow to 2
Since we have the time to double, we can see how many would “fit” in the total time to grow to x:
For example, how many doublings occur from 1 to 64?
Well, ln(64) = 4.158. And ln(2) = .693. The number of doublings that fit is:
In the real world, calculators may lose precision, so use a direct log base 2 function if possible. And of course, we can have a fractional number: Getting from 1 to the square root of 2 is “half” a doubling, or log2(1.414) = 0.5.
Changing to log base 10 means we’re counting the number of 10x-ings that fit:
Ratios summarize a scenario with a number, such as “income per day”. Unfortunately, this hides the explanation for how the result came about.
For example, look at two businesses:
Annie’s Art Gallery sells a single, $1000 piece every day
Frank’s Fish Emporium sells 250 trout at $4/each every day
By the numbers, they’re identical $1000/day operations, right? Hah.
Here’s how each business actually behaves:
Transactions are the workhorse that drive income, but they’re lost in the dollars/day description. When studying an idea, separate the results into Oomph and Often:
With Oomph and Often, I visualize two distinct levers to increase. A ratio like dollars/day makes me stumble through thoughts like: “For better results, I need 1/day to improve… which means the day gets shorter… How’s that possible? Oh, that must be the portion of the day used for each transaction…”.
Why make it difficult? Rewrite the ratio to include the root case: What’s the Oomph, and how Often does it happen?
Horsepower, Torque, RPM
In physics, we define everyday concepts like “power” with a formal ratio:
Ok. Power can be explained by a ratio, but we’re already in inverted-thinking mode. Just another hassle when exploring an already-tricky concept.
How about this:
Easier, I think. What could Oomph and Often mean?
Well, Oomph is probably the work we do (such as moving a weight) and Often is how frequently we do it (how many reps did you put in?).
In the same minute, suppose Frank lifted 100lbs ten times, while Annie lifted 1000lbs once. From the equation, they have the same power (though to be honest, I’m more frightened by Annie.)
An engine mechanic might internalize power like this:
What does that mean?
Torque is the Oomph, or how much weight (and how far) can be moved by a turn of the engine (i.e., moving 500lbs by 1 foot)
RPM (revolutions per minute) is how frequently the engine turns
A motorcycle engine is designed for reps, i.e. spinning the wheels quickly. It doesn’t need much torque — just enough to pull itself and a few passengers — but it needs to send that to the wheels again and again.
A bulldozer is designed for “Oomph”, such as knocking over a wall. We don’t need to tap into that work very frequently, as one destroyed wall per minute is great, thanks.
I’m not a physicist or car guy, but I can at least conceptualize the tradeoffs with the Oomph/Often metaphor.
Gears can change the tradeoff between Oomph and Often in a given engine. If you’re going uphill, fighting gravity, what do you want more of? If you’re cruising on a highway? Trying to start from a standstill? Driving over slippery snow? Lost the brakes and need to slow down the car?
Oomph/Often gets me thinking intuitively, Work/Time does not.
Variation: Electric Power
Electric power has the same ratio as mechanical power:
Yikes. It’s not clear what this means. How about:
It’s hard to have ideas out of the blue, but we might imagine something (a mini-engine?) is moving the Oomph around inside the wire. If we call it a “charge” then we have:
And we can give those subparts formal names:
Voltage (Oomph): How much work each charge contributes
Current (Often): How quickly charges are moving through the wire
Now we get the familiar:
Boomshakala! I don’t have a good intuition for electricity, at least my goal is clear: find analogies where voltage means Oomph, and current means Often.
And still, we can take a crack at intuitive thinking: when you get zapped by a doorknob in winter, was that Oomph or Often? What attribute should batteries maximize? What’s better for moving energy through stubborn power lines? (Vive la résistance!)
The ratios think every type of power reduces to a generic Work/Time calculation. The Oomph/Often metaphor gets us thinking about Torque/RPM in one scenario and Voltage/Current in another.
What’s Really Going On? Parameters, Baby.
The Oomph/Often viewpoint lets us think about the true cause of the ratio. Instead of dollars and days, we wonder how the actual transactions affect the outcome:
Can we increase the size of each transaction?
Can we increase the number each day?
In formal terms, we’ve introduced a new parameter to explain the interaction. To change a ratio from a/b to one parameterized by x, we can do:
We change our viewpoint to see x as the key component. In math, we often switch viewpoints to simplify problems:
Instead of asking what happens to the observer, can we change parameters and ask what the mover sees? (Degrees vs. radians.)
Can we see a giant function as being parameterized by smaller ones? (See the chain rule.)
Can we express probabilities as odds, instead of percentages? (It makes Bayes Theorem easier.)
Adjusting parameters is a way to morph an idea that doesn’t click into one that does. Since I don’t naturally think with inverted units, I’ve made it easier on myself: deal with two multiplications, instead of a division.
Trig mnemonics like SOH-CAH-TOA focus on computations, not concepts:
TOA explains the tangent about as well as x2 + y2 = r2 describes a circle. Sure, if you’re a math robot, an equation is enough. The rest of us, with organic brains half-dedicated to vision processing, seem to enjoy imagery. And “TOA” evokes the stunning beauty of an abstract ratio.
I think you deserve better, and here’s what made trig click for me.
Visualize a dome, a wall, and a ceiling
Trig functions are percentages to the three shapes
Motivation: Trig Is Anatomy
Imagine Bob The Alien visits Earth to study our species.
Without new words, humans are hard to describe: “There’s a sphere at the top, which gets scratched occasionally” or “Two elongated cylinders appear to provide locomotion”.
After creating specific terms for anatomy, Bob might jot down typical body proportions:
The armspan (fingertip to fingertip) is approximately the height
A head is 5 eye-widths wide
Adults are 8 head-heights tall
How is this helpful?
Well, when Bob finds a jacket, he can pick it up, stretch out the arms, and estimate the owner’s height. And head size. And eye width. One fact is linked to a variety of conclusions.
Even better, human biology explains human thinking. Tables have legs, organizations have heads, crime bosses have muscle. Our biology offers ready-made analogies that appear in man-made creations.
Now the plot twist: you are Bob the alien, studying creatures in math-land!
Generic words like “triangle” aren’t overly useful. But labeling sine, cosine, and hypotenuse helps us notice deeper connections. And scholars might study haversine, exsecant and gamsin, like biologists who find a link between your fibia and clavicle.
Trig is the anatomy book for “math-made” objects. If we can find a metaphorical triangle, we’ll get an armada of conclusions for free.
Sine/Cosine: The Dome
Instead of staring at triangles by themselves, like a caveman frozen in ice, imagine them in a scenario, hunting that mammoth.
Pretend you’re in the middle of your dome, about to hang up a movie screen. You point to some angle “x”, and that’s where the screen will hang.
The angle you point at determines:
sine(x) = sin(x) = height of the screen, hanging like a sign
cosine(x) = cos(x) = distance to the screen on the ground
the hypotenuse, the distance to the top of the screen, is always the same
Want the biggest screen possible? Point straight up. It’s at the center, on top of your head, but it’s big dagnabbit.
Want the screen the furthest away? Sure. Point straight across, 0 degrees. The screen has “0 height” at this position, and it’s far away, like you asked.
The height and distance move in opposite directions: bring the screen closer, and it gets taller.
Tip: Trig Values Are Percentages
Nobody ever told me in my years of schooling: sine and cosine are percentages. They vary from +100% to 0 to -100%, or max positive to nothing to max negative.
Let’s say I paid $14 in tax. You have no idea if that’s expensive. But if I say I paid 95% in tax, you know I’m getting ripped off.
An absolute height isn’t helpful, but if your sine value is .95, I know you’re almost at the top of your dome. Pretty soon you’ll hit the max, then start coming down again.
How do we compute the percentage? Simple: divide the current value by the maximum possible (the radius of the dome, aka the hypotenuse).
That’s why we’re told “Sine = Opposite / Hypotenuse”. It’s to get a percentage! A better wording is “Sine is your height, as a percentage of the max”. (Sine becomes negative if your angle points “underground”. Cosine becomes negative when your angle points backwards.)
Let’s simplify the calculation by assuming we’re on the unit circle (radius 1). Now we can skip the division and just say sine = height.
Every circle is really the unit circle, scaled up or down to a different size. So work out the connections on the unit circle and apply the results to your particular scenario.
Try it out: plug in an angle and see what percent of the height and width it reaches:
The growth pattern of sine isn’t an even line. The first 45 degrees cover 70% of the height, and the final 10 degrees (from 80 to 90) only cover 2%.
This should make sense: at 0 degrees, you’re moving nearly vertical, but as you get to the top of the dome, your height changes level off.
Tangent/Secant: The Wall
One day your neighbor puts up a wall right next to your dome. Ack, your view! Your resale value!
But can we make the best of a bad situation?
Sure. What if we hang our movie screen on the wall? You point at an angle (x) and figure out:
tangent(x) = tan(x) = height of screen on the wall
distance to screen: 1 (the screen is always the same distance along the ground, right?)
secant(x) = sec(x) = the “ladder distance” to the screen
We have some fancy new vocab terms. Imagine seeing the Vitruvian “TAN GENTleman” projected on the wall. You climb the ladder, making sure you can “SEE, CAN’T you?”. (Yeah, he’s naked… won’t forget the analogy now, will you?)
Let’s notice a few things about tangent, the height of the screen.
It starts at 0, and goes infinitely high. You can keep pointing higher and higher on the wall, to get an infinitely large screen! (That’ll cost ya.)
Tangent is just a bigger version of sine! It’s never smaller, and while sine “tops off” as the dome curves in, tangent keeps growing.
How about secant, the ladder distance?
Secant starts at 1 (ladder on the floor to the wall) and grows from there
Secant is always longer than tangent. The leaning ladder used to put up the screen must be longer than the screen itself, right? (At enormous sizes, when the ladder is nearly vertical, they’re close. But secant is always a smidge longer.)
Remember, the values are percentages. If you’re pointing at a 50-degree angle, tan(50) = 1.19. Your screen is 19% larger than the distance to the wall (the radius of the dome).
(Plug in x=0 and check your intuition that tan(0) = 0, and sec(0) = 1.)
Cotangent/Cosecant: The Ceiling
Amazingly enough, your neighbor now decides to build a ceiling on top of your dome, far into the horizon. (What’s with this guy? Oh, the naked-man-on-my-wall incident…)
Well, time to build a ramp to the ceiling, and have a little chit chat. You pick an angle to build and work out:
cotangent(x) = cot(x) = how far the ceiling extends before we connect
cosecant(x) = csc(x) = how long we walk on the ramp
the vertical distance traversed is always 1
Tangent/secant describe the wall, and COtangent and COsecant describe the ceiling.
Our intuitive facts are similar:
If you pick an angle of 0, your ramp is flat (infinite) and never reachers the ceiling. Bummer.
The shortest “ramp” is when you point 90-degrees straight up. The cotangent is 0 (we didn’t move along the ceiling) and the cosecant is 1 (the “ramp length” is at the minimum).
Visualize The Connections
A short time ago I had zero “intuitive conclusions” about the cosecant. But with the dome/wall/ceiling metaphor, here’s what we see:
Whoa, it’s the same triangle, just scaled to reach the wall and ceiling. We have vertical parts (sine, tangent), horizontal parts (cosine, cotangent), and “hypotenuses” (secant, cosecant). (Note: the labels show where each item “goes up to”. Cosecant is the full distance from you to the ceiling.)
And from similarity, ratios like “height to width” must be the same for these triangles. (Intuition: step away from a big triangle. Now it looks smaller in your field of view, but the internal ratios couldn’t have changed.)
This is how we find out “sine/cosine = tangent/1″.
I’d always tried to memorize these facts, when they just jump out at us when visualized. SOH-CAH-TOA is a nice shortcut, but get a real understanding first!
Gotcha: Remember Other Angles
Psst… don’t over-focus on a single diagram, thinking tangent is always smaller than 1. If we increase the angle, we reach the ceiling before the wall:
The Pythagorean/similarity connections are always true, but the relative sizes can vary.
(But, you might notice that sine and cosine are always smallest, or tied, since they’re trapped inside the dome. Nice!)
Summary: What Should We Remember?
For most of us, I’d say this is enough:
Trig explains the anatomy of “math-made” objects, such as circles and repeating cycles
The dome/wall/ceiling analogy shows the connections between the trig functions
Trig functions return percentages, that we apply to our specific scenario
You don’t need to memorize 12 + cot2 = csc2, except for silly tests that mistake trivia for understanding. In that case, take a minute to draw the dome/wall/ceiling diagram, fill in the labels (a tan gentleman you can see, can’t you?), and create a cheatsheet for yourself.
In a follow-up, we’ll learn about graphing, complements, and using Euler’s Formula to find even more connections.
Appendix: The Original Definition Of Tangent
You may see tangent defined as the length of the tangent line from the circle to the x-axis (geometry buffs can work this out).
As expected, at the top of the circle (x=90) the tangent line can never reach the x-axis and is infinitely long.
I like this intuition because it helps us remember the name “tangent”, and here’s a nice interactive trig guide to explore:
Still, it’s critical to put the tangent vertical and recognize it’s just sine projected on the back wall (along with the other triangle connections).
Appendix: Inverse Functions
Trig functions take an angle and return a percentage. sin(30) = .5 means a 30-degree angle is 50% of the max height.
The inverse trig functions let us work backwards, and are written sin-1 or arcsin (“arcsine”), and often written asin in various programming languages.
If our height is 25% of the dome, what’s our angle?
Now what about something exotic, like inverse secant? Often times it’s not available as a calculator function (even the one I built, sigh).
Looking at our trig cheatsheet, we find an easy ratio where we can compare secant to 1. For example, secant to 1 (hypotenuse to horizontal) is the same as 1 to cosine:
Suppose our secant is 3.5, i.e. 350% of the radius of the unit circle. What’s the angle to the wall?
Appendix: A Few Examples
Example: Find the sine of angle x.
Ack, what a boring question. Instead of “find the sine” think, “What’s the height as a percentage of the max (the hypotenuse)?”.
First, notice the triangle is “backwards”. That’s ok. It still has a height, in green.
What’s the max height? By the Pythagorean theorem, we know
Ok! The sine is the height as a percentage of the max, which is 3/5 or .60.
Follow-up: Find the angle.
Of course. We have a few ways. Now that we know sine = .60, we can just do:
Here’s another approach. Instead of using sine, notice the triangle is “up against the wall”, so tangent is an option. The height is 3, the distance to the wall is 4, so the tangent height is 3/4 or 75%. We can use arctangent to turn the percentage back into an angle:
Example: Can you make it to shore?
You’re on a boat with enough fuel to sail 2 miles. You’re currently .25 miles from shore. What’s the largest angle you could use and still reach land? Also, the only reference available is Hubert’s Compendium of Arccosines, 3rd Ed. (Truly, a hellish voyage.)
Ok. Here, we can visualize the beach as the “wall” and the “ladder distance” to the wall is the secant.
First, we need to normalize everything in terms of percentages. We have 2 / .25 = 8 “hypotenuse units” worth of fuel. So, the largest secant we could allow is 8 times the distance to the wall.
We’d like to ask “What angle has a secant of 8?”. But we can’t, since we only have a book of arccosines.
We use our cheatsheet diagram to relate secant to cosine: Ah, I see that “sec/1 = 1/cos”, so
A secant of 8 implies a cosine of 1/8. The angle with a cosine of 1/8 is arccos(1/8) = 82.8 degrees, the largest we can afford.
Not too bad, right? Before the dome/wall/ceiling analogy, I’d be drowning in a mess of computations. Visualizing the scenario makes it simple, even fun, to see which trig buddy can help us out.
In your problem, think: am I interested in the dome (sin/cos), the wall (tan/sec), or the ceiling (cot/csc)?
Update: The owner of Grey Matters put together interactive diagrams for the analogies (drag the slider on the left to change the angle):
After months of work with the help of Neil, a great designer, and my Excel-blogging friend Andrew, I’m happy to launch a brand-new design.
My goals were to be friendly, readable, and easy-to-navigate. Here’s a quick before-and-after:
Neil did a fantastic job here — I’d been looking for a way to convey a welcoming, conversational tone.
A site about explanations should describe what it does simply, right?
The fonts are bumped up, there’s more breathing room, and pages are optimized for iPads/iPhones. Instead of a text-dense cram session, I want an unhurried walkthrough of insights.
My favorite feature is a site summary that reduces insights to a few words. Previously, I had trouble navigating the various articles, and I bet you did too :). Readers of the newsletter got a sneak peek, and I have a PDF version I’ll be sending out to subscribers as well.
Overall, BetterExplained is an excited friend who shares what really helps ideas click, not an authority trying to be the grand poombah of math. Let’s have a good time on this journey of learning.
Algebra is really about relationships. How are things connected? Do they move together, or apart, or maybe they’re completely independent?
Normal equations assume an “input to output” connection. That is, we take an input (x=3), plug it into the relationship (y=x2), and observe the result (y=9).
But is that the only way to see a scenario? The setup y=x2 implies that y only moves because of x. But it could be that y just coincidentally equals x2, and some hidden factor is changing them both (the factor changes x to 3 while also changing y to 9).
As a real world example: For every degree above 70, our convenience store sells x bottles of sunscreen and x2 pints of ice cream.
We could write the algebra relationship like this:
And it’s correct… but misleading!
The equation implies sunscreen directly changes the demand for ice cream, when it’s the hidden variable (temperature) that changed them both!
It’s much better to write two separate equations
that directly point out the causality. The ideas “temperature impacts ice cream” and “temperature impacts sunscreen” clarify the situation, and we lose information by trying to factor away the common “temperature” portion. Parametric equations get us closer to the real-world relationship.
Don’t Think About Time. Just Look for Root Causes.
A reader pointed out that nearly every parametric equation tutorial uses time as its example parameter. We get so hammered with “parametric equations involve time” that we forget the key insight: parameters point to the cause. Why did we change? (Maybe it was time, or temperature, or perhaps sunscreen really does make you hungry for ice cream.)
Most algebraic equations lay out a connection like y = x2. Parametric equations remind us to look deeper (lost on me until recently; I’d been stuck in the “time/physics” mindset).
Sure, not every setup has a hidden parameter, but isn’t it worth a look?
(Like the new look? I’ve been working with a great designer and will be refreshing the main site too.)
The goal is an intuition-first look at a notoriously gnarly subject. This isn’t a replacement for a stodgy textbook — it’s the friendly introduction I wish I’d had. A few hours of reading that would have saved me years of frustration.
The course text is free online, with a complete edition available, which includes:
The price of the course will increase when the final version is released, so hop onto the beta to snag the lower price.
Building A Course: Lessons Learned
A few insights jumped out while making the course. This may be helpful if you’re considering teaching a course one day (I hope you are).
I struggled with what to make free vs. paid. I love sharing insights with people… and I also love knowing I can do so until I’m an old man, complaining that newfangled brain-chip implants aren’t “real learning”.
Incentives always exist. I want to make education projects sustainable, designed to satisfy readers, not a 3rd party.
Similar to the fantastic Rails Tutorial Book, the course text is free, with extra resources available. Having the core material free with paid variations & guidance helps align my need to create, share, and be sustainable.
Being Focused Matters
Historically, I’m lucky to write an article a month. But this summer, I wrote 16 lessons in 6 weeks. What was the difference?
Well, pressure from friends, for one: I’d promised to do a calculus course this summer. But mostly, it was the focus of having a single topic, brainstorming on numerous analogies/examples, and carving a rough path through on a schedule (2-3 articles/week).
I hope this doesn’t sound disciplined, because I’m not. A combination of fear (I told people I’d do this) and frustration (Argh, I remember being a student and not having things click) pushed me. When I finished, I took a break from writing and vegged out for a few weeks. But I think it was a worthwhile trade — in my mind, a year’s worth of material was ready.
There’s many options for making a course. Modules. Quizzes. Interactive displays. Tribal dance routines. Hundreds of tools to convey your message.
And… what is that message, anyway? Are we transmitting facts, or building insight?
Until the fundamentals are working, the fancy dance routine seems useless. I’d rather read genuine insights from a pizza box than have an interactive hologram that that recites a boring lecture.
When lessons are lightweight and easy to update, you’re excited about feedback (Oh yeah! A chance to make it better!).
The more static the medium, the more you fear feedback (Oh no, I have to redo it?). A fixed medium has its place, ideally after a solid foundation has been mapped out.
I’ll be polishing the course in the coming weeks, feedback is welcome!
First off: what’s wrong with how calculus is taught today? (Ha!)
Just look at the results. The vast majority of survivors, the STEM folks who used calculus in several classes, have no lasting intuition. We memorized procedures, applied them to pre-packaged problems (“Say, fellow, what is the derivative of x2?”), and internalized nothing.
Want proof? No problem. Take a string and wrap it tight around a quarter. Take another string and wrap it tight around the Earth.
Ok. Now, lengthen both strings, adding more to the ends, so there’s a 1-inch gap all the way around around the quarter, and a 1-inch gap all the way around the Earth (sort of like having a ring floating around Saturn). Got it?
Quiz time: Which scenario uses more extra string? Does it take more additional string to put a 1-inch gap around the quarter, or to put a 1-inch gap around the Earth?
Think about it. Ponder it over. Ready? It’s the… same. The same! Adding a 1-inch gap around the Earth, and a quarter, uses the same 6.28 inches.
And to be blunt: if you “learned” calculus but didn’t have the answer within 3 seconds, you don’t truly know it. At least not deep down.
Now don’t feel bad, I didn’t know it either. Only one engineer in the dozens I’ve asked came up with the answer instantly, without second-guesses (my karate teacher, Mr. Rose).
This question has a few levels of understanding:
Algebra Robot: Calculating change in circumference: 2*pi*(r + 1) - 2*pi*r = 2*pi. They are the same. Calculation complete.
Calculus Disciple: Oh! We know circumference = 2*pi*r. The derivative is 2*pi, a constant, which means the current radius has no impact on a changing circumference.
Calculus Zen master: I see the true nature of things. We’re changing a 1-dimensional radius and watching a 1-dimensional perimeter. A dimension in, a dimension out, it’s like making a fence 1-foot longer: the initial size doesn’t change the work needed. The gap could be made around a circle, square, rectangle, or Richard Nixon mask, and it’s the same effort for similar shapes. (And, silly me, I’d forgotten the equation for circumference anyway!)
We can be calculus warrior-monks, cutting through problems with our intuition. Notice how the most advanced approach didn’t need specific equations — it was just thinking about the problem! Equations are nice tools, but are they your only source of understanding?
See, according to standardized tests and final exams, I “knew” calculus — but clearly only to the beginning level. I didn’t immediately recognize how calculus could help with a question about making a string longer. If you asked someone for the amount of cash in a wallet with six $20 bills, and they didn’t think to use multiplication, would you say they’ve internalized arithmetic? (“Oh geez, you didn’t tell me this would be a multiplication question! Could you set up the problem for me?”)
I want you to have the intuition-first calculus class I never did. The goal is lasting intuition, shared by an excited friend, and built with the test of “If you haven’t internalized the idea, the material must change.”
How Can We Make Learning Intuitive And Interesting?
With Progressive Refinement. You may have seen these two methods to download and display an image:
Baseline Rendering: Download it start-to-finish in full detail
Progressive Rendering: Download a blurry version, and gradually refine it
Teaching a subject is similar:
Baseline Teaching: Cover individual concepts in full-depth, one after another
Progressive Teaching: See the big picture, how the whole fits together, then sharpen the detail
The “start-to-finish” approach seems official. Orderly. Rigorous. And it doesn’t work.
What, exactly, do you know when you’ve seen the first 20% of a portrait in full resolution? A forehead? Do you even know the gender? The age? The teacher has forgotten that you’ve never seen the full picture and likely can’t appreciate that you’re even seeing a forehead!
Progressive rendering (blurry-to-sharp) gives a full overview, a rough approximation of what the expert sees, and gets you curious about more. After the overview, we start filling in the details. And because you have an idea of where you’re going, you’re excited to learn. What’s better: “Let’s download the next 10% of the forehead”, or “Let’s sharpen the picture”?
Let’s admit it: we forget the details of most classes. If we’ll have a hazy memory anyway, shouldn’t it be of the entire picture? That has the best shot of enticing us to sharpen the details later on.
How Do We Know If A Lesson Is Any Good?
With the Pizza Box Test. Imagine you pass a dumpster while walking home. You see a message scrawled on a discarded pizza box. Is the note so insightful and compelling that you’d take the pizza box home to finish reading it?
Ignore the sparkle of a lesson being digital, mobile-friendly, gamified, interactive, or a gesture-based hologram. Would you take this lesson home if it were written on a pizza box?
If yes, great! Clean it up and add in the glitz. But if the core lesson is not compelling without the trimmings, it must be redone.
Everyone’s “pizza box” standard varies; just have one. Here’s a few things I wish were written on the boxes outside my high school:
Psst! Think of e as a universal component in all growth rates, just like pi is a universal factor in all circles…
Hey buddy! Degrees are from the observer’s perspective. Radians are from the mover’s. That’s why radians are more natural. Let me show you…
Yo! Imaginary numbers are another dimension, and multiplication by i is a 90-degree rotation into that dimension! Two rotations and you’re facing backwards, aka -1.
How Do We Know What’s Best For The Student?
By focusing on what future-you would teach current-you.
Teachers, like all of us, face external incentives which may interfere with their goals (publish or perish, mandated curriculum, need to impress others with jargon, etc.). The test of “What would future-me teach present-me?” helps me focus on the essentials:
Use the shortest lessons possible. There’s no word count to meet. The same insight in fewer words is preferred.
Use the simplest language possible. It’s future-me talking to current-me. There’s nobody to impress here.
Use any analogy that’s memorable. I’m not embarrassed by “childish” analogies. If a metaphor excites me, and helps, I’m going to use it. Nyah.
Be a friend, not lecturer. I want a buddy, a guide who happened to experience the material before I did, not a pompous schoolmarm I can’t question.
Point out the naked emperor. Most calculus classes cover “limits, derivatives, integrals” in that order because… why? Limits are the most nuanced concept, invented in the mid-1800s. Were mathematicians like Newton, Leibniz, Euler, Gauss, Taylor, Fourier and Bernoulli inadequate because they didn’t use them? (Conversely: are you better than them because you do?). Most courses are too timid (or oblivious) to question the strategy of covering the most elusive, low-level topic first.
Learn for the long haul. The elephant in the room is that most math courses are a stepping-stone to some credential. Future-me doesn’t play that game: he only benefits when current-me permanently understands something.
Sign Up To Learn More
Let’s learn calculus intuition-first. The goal is a lasting upgrade to your intuition and storehouse of analogies. If that doesn’t happen, the course isn’t working, and it will be enhanced until it does.
Sign up for the mailing list and I’ll let you know when the course preview is ready, in November.
With the magic of print-on-demand, you can order the book with overnight shipping (Amazon Prime!), and be reading full-color insights tomorrow. Yowza.
I’ve often been asked if a print version can be made, and I’m beaming to say it’s now a reality:
12 chapters (~100 pages) of full-color explanations
Professional-quality typesetting & layout
Gorgeous, high-resolution text and diagrams
Compact, easy-to-carry size with comfortable margins (7″ x 10″)
The best part? There’s no garish marketing fluff needed by traditional books that compete for shelf space (testimonials, callouts, those can go in the Amazon description!). The book is my take on a simple, friendly presentation of the math essentials. It’s what I wish I had in high school (and college, and afterwards), and a tremendous value for the time and frustration it will save you.
Unlike a textbook you’re afraid to open, this book is meant to be accessible. Years later, flip back to that diagram that helped imaginary numbers click. Show that curious young student how the Pythagorean theorem goes way beyond triangles. Math is meant to be seen and felt, not just thought about.
The full-color format does increase the printing costs, but I wanted to share the highest-quality version I could (hey, I’m a reader too!). The introductory price (under $20) is heavily discounted and will change soon, so grab your copy today!
As always, happy math.
PS: Reviews are sincerely appreciated, and if you’re a math reviewer (or willing to be one!), contact me and I’ll get a copy your way. Thanks for your support!
Limits, the Foundations Of Calculus, seem so artificial and weasely: “Let x approach 0, but not get there, yet we’ll act like it’s there… ” Ugh. Here’s how I learned to enjoy them:
What is a limit? Our best prediction of a point we didn’t observe.
How do we make a prediction? Zoom into the neighboring points. If our prediction is always in-between neighboring points, no matter how much we zoom, that’s our estimate.
Why do we need limits? Math has “black hole” scenarios (dividing by zero, going to infinity), and limits give us a reasonable estimate.
How do we know we’re right? We don’t. Our prediction, the limit, isn’t required to match reality. But for most natural phenomena, it sure seems to.
Limits let us ask “What if?”. If we can directly observe a function at a value (like x=0, or x growing infinitely), we don’t need a prediction. The limit wonders, “If you can see everything except a single value, what do you think is there?”.
When our prediction is consistent and improves the closer we look, we feel confident in it. And if the function behaves smoothly, like most real-world functions do, the limit is where the missing point must be.
Key Analogy: Predicting A Soccer Ball
Pretend you’re watching a soccer game. Unfortunately, the connection is choppy:
Ack! We missed what happened at 4:00. Even so, what’s your prediction for the ball’s position?
Easy. Just grab the neighboring instants (3:59 and 4:01) and predict the ball to be somewhere in-between.
And… it works! Real-world objects don’t teleport; they move through intermediate positions along their path from A to B. Our prediction is “At 4:00, the ball was between its position at 3:59 and 4:01″. Not bad.
With a slow-motion camera, we might even say “At 4:00, the ball was between its positions at 3:59.999 and 4:00.001″.
Our prediction is feeling solid. Can we articulate why?
The predictions agree at increasing zoom levels. Imagine the 3:59-4:01 range was 9.9-10.1 meters, but after zooming into 3:59.999-4:00.001, the range widened to 9-12 meters. Uh oh! Zooming should narrow our estimate, not make it worse! Not every zoom level needs to be accurate (imagine seeing the game every 5 minutes), but to feel confident, there must be some threshold where subsequent zooms only strengthen our range estimate.
The before-and-after agree. Imagine at 3:59 the ball was at 10 meters, rolling right, and at 4:01 it was at 50 meters, rolling left. What happened? We had a sudden jump (a camera change?) and now we can’t pin down the ball’s position. Which one had the ball at 4:00? This ambiguity shatters our ability to make a confident prediction.
With these requirements in place, we might say “At 4:00, the ball was at 10 meters. This estimate is confirmed by our initial zoom (3:59-4:01, which estimates 9.9 to 10.1 meters) and the following one (3:59.999-4:00.001, which estimates 9.999 to 10.001 meters)”.
Limits are a strategy for making confident predictions.
Exploring The Intuition
Let’s not bring out the math definitions just yet. What things, in the real world, do we want an accurate prediction for but can’t easily measure?
What’s the circumference of a circle?
Finding pi “experimentally” is tough: bust out a string and a ruler?
We can’t measure a shape with seemingly infinite sides, but we can wonder “Is there a predicted value for pi that is always accurate as we keep increasing the sides?”
We can’t easily measure the result of infinitely-compounded growth. But, if we could make a prediction, is there a single rate that is ever-accurate? It seems to be around 2.71828…
Can we use simple shapes to measure complex ones?
Circles and curves are tough to measure, but rectangles are easy. If we could use an infinite number of rectangles to simulate curved area, can we get a result that withstands infinite scrutiny? (Maybe we can find the area of a circle.)
Can we find the speed at an instant?
Speed is funny: it needs a before-and-after measurement (distance traveled / time taken), but can’t we have a speed at individual instants? Hrm.
Limits help answer this conundrum: predict your speed when traveling to a neighboring instant. Then ask the “impossible question”: what’s your predicted speed when the gap to the neighboring instant is zero?
Note: The limit isn’t a magic cure-all. We can’t assume one exists, and there may not be an answer to every question. For example: Is the number of integers even or odd? The quantity is infinite, and neither the “even” nor “odd” prediction stays accurate as we count higher. No well-supported prediction exists.
For pi, e, and the foundations of calculus, smart minds did the proofs to determine that “Yes, our predicted values get more accurate the closer we look.” Now I see why limits are so important: they’re a stamp of approval on our predictions.
The Math: The Formal Definition Of A Limit
Limits are well-supported predictions. Here’s the official definition:
means for all real ε > 0 there exists a real δ > 0 such that for all x with 0 < |x − c| < δ, we have |f(x) − L| < ε
Let’s make this readable:
When we “strongly predict” that f(c) = L, we mean
for all real ε > 0
for any error margin we want (+/- .1 meters)
there exists a real δ > 0
there is a zoom level (+/- .1 seconds)
such that for all x with 0 < |x − c| < δ, we have |f(x) − L| < ε
where the prediction stays accurate to within the error margin
There’s a few subtleties here:
The zoom level (delta, δ) is the function input, i.e. the time in the video
The error margin (epsilon, ε) is the most the function output (the ball’s position) can differ from our prediction throughout the entire zoom level
The absolute value condition (0 < |x − c| < δ) means positive and negative offsets must work, and we’re skipping the black hole itself (when |x – c| = 0).
We can’t evaluate the black hole input, but we can say “Except for the missing point, the entire zoom level confirms the prediction f(c) = L.” And because f(c) = L holds for any error margin we can find, we feel confident.
Could we have multiple predictions? Imagine we predicted L1 and L2 for f(c). There’s some difference between them (call it .1), therefore there’s some error margin (.01) that would reveal the more accurate one. Every function output in the range can’t be within .01 of both predictions. We either have a single, infinitely-accurate prediction, or we don’t.
Yes, we can get cute and ask for the “left hand limit” (prediction from before the event) and the “right hand limit” (prediction from after the event), but we only have a real limit when they agree.
A function is continuous when it always matches the predicted value (and discontinuous if not):
Calculus typically studies continuous functions, playing the game “We’re making predictions, but only because we know they’ll be correct.”
The Math: Showing The Limit Exists
We have the requirements for a solid prediction. Questions asking you to “Prove the limit exists” ask you to justify your estimate.
For example: Prove the limit at x=2 exists for
The first check: do we even need a limit? Unfortunately, we do: just plugging in “x=2″ means we have a division by zero. Drats.
But intuitively, we see the same “zero” (x – 2) could be cancelled from the top and bottom. Here’s how to dance this dangerous tango:
Assume x is anywhere except 2 (It must be! We’re making a prediction from the outside.)
We can then cancel (x – 2) from the top and bottom, since it isn’t zero.
We’re left with f(x) = 2x + 1. This function can be used outside the black hole.
What does this simpler function predict? That f(2) = 2*2 + 1 = 5.
So f(2) = 5 is our prediction. But did you see the sneakiness? We pretended x wasn’t 2 [to divide out (x-2)], then plugged in 2 after that troublesome item was gone! Think of it this way: we used the simple behavior from outside the event to predict the gnarly behavior at the event.
We can prove these shenanigans give a solid prediction, and that f(2) = 5 is infinitely accurate.
For any accuracy threshold (ε), we need to find the “zoom range” (δ) where we stay within the given accuracy. For example, can we keep the estimate between +/- 1.0?
Sure. We need to find out where
In other words, x must stay within 0.5 of 2 to maintain the initial accuracy requirement of 1.0. Indeed, when x is between 1.5 and 2.5, f(x) goes from f(1.5) = 4 to and f(2.5) = 6, staying +/- 1.0 from our predicted value of 5.
We can generalize to any error tolerance (ε) by plugging it in for 1.0 above. We get:
If our zoom level is “δ = 0.5 * ε”, we’ll stay within the original error. If our error is 1.0 we need to zoom to .5; if it’s 0.1, we need to zoom to 0.05.
This simple function was a convenient example. The idea is to start with the initial constraint (|f(x) – L| < ε), plug in f(x) and L, and solve for the distance away from the black-hole point (|x – c| < ?). It’s often an exercise in algebra.
Sometimes you’re asked to simply find the limit (plug in 2 and get f(2) = 5), other times you’re asked to prove a limit exists, i.e. crank through the epsilon-delta algebra.
Flipping Zero and Infinity
Infinity, when used in a limit, means “grows without stopping”. The symbol ∞ is no more a number than the sentence “grows without stopping” or “my supply of underpants is dwindling”. They are concepts, not numbers (for our level of math, Aleph me alone).
When using ∞ in a limit, we’re asking: “As x grows without stopping, can we make a prediction that remains accurate?”. If there is a limit, it means the predicted value is always confirmed, no matter how far out we look.
But, I still don’t like infinity because I can’t see it. But I can see zero. With limits, you can rewrite
You can get sneaky and define y = 1/x, replace items in your formula, and then use
so it looks like a normal problem again! (Note from Tim in the comments: the limit is coming from the right, since x was going to positive infinity). I prefer this arrangement, because I can see the location we’re narrowing in on (we’re always running out of paper when charting the infinite version).
Why Aren’t Limits Used More Often?
Imagine a kid who figured out that “Putting a zero on the end” made a number 10x larger. Have 5? Write down “5″ then “0″ or 50. Have 100? Make it 1000. And so on.
He didn’t figure out why multiplication works, why this rule is justified… but, you’ve gotta admit, he sure can multiply by 10. Sure, there are some edge cases (Would 0 become “00″?), but it works pretty well.
The rules of calculus were discovered informally (by modern standards). Newton deduced that “The derivative of x^3 is 3x^2″ without rigorous justification. Yet engines whirl and airplanes fly based on his unofficial results.
The calculus pedagogy mistake is creating a roadblock like “You must know Limits™ before appreciating calculus”, when it’s clear the inventors of calculus didn’t. I’d prefer this progression:
Calculus asks seemingly impossible questions: When can rectangles measure a curve? Can we detect instantaneous change?
Limits give a strategy for answering “impossible” questions (“If you can make a prediction that withstands infinite scrutiny, we’ll say it’s ok.”)
They’re a great tag-team: Calculus explores, limits verify. We memorize shortcuts for the results we verified with limits (d/dx x^3 = 3x^2), just like we memorize shortcuts for the rules we verified with multiplication (adding a zero means times 10). But it’s still nice to know why the shortcuts are justified.
Limits aren’t the only tool for checking the answers to impossible questions; infinitesimals work too. The key is understanding what we’re trying to predict, then learning the rules of making predictions.
My first intuition about Bayes Theorem was “take evidence and account for false positives”. Does a lab result mean you’re sick? Well, how rare is the disease, and how often do healthy people test positive? Misleading signals must be considered.
This helped me muddle through practice problems, but I couldn’t think with Bayes. The big obstacles:
Percentages are hard to reason with. Odds compare the relative frequency of scenarios (A:B) while percentages use a part-to-whole “global scenario” [A/(A+B)]. A coin has equal odds (1:1) or a 50% chance of heads. Great. What happens when heads are 18x more likely? Well, the odds are 18:1, can you rattle off the decimal percentage? (I’ll wait…) Odds require less computation, so let’s start with them.
Equations miss the big picture. Here’s Bayes Theorem, as typically presented:
It reads right-to-left, with a mess of conditional probabilities. How about this version:
original odds * evidence adjustment = new odds
Bayes is about starting with a guess (1:3 odds for rain:sunshine), taking evidence (it’s July in the Sahara, sunshine 1000x more likely), and updating your guess (1:3000 chance of rain:sunshine). The “evidence adjustment” is how much better, or worse, we feel about our odds now that we have extra information (if it was December in Seattle, you might say rain was 1000x as likely).
Let’s start with ratios and sneak up to the complex version.
Caveman Statistician Og
Og just finished his CaveD program, and runs statistical research for his tribe:
He saw 50 deer and 5 bears overall (50:5 odds)
At night, he saw 10 deer and 4 bears (10:4 odds)
What can he deduce? Well,
original odds * evidence adjustment = new odds
evidence adjustment = new odds / original odds
At night, he realizes deer are 1/4 as likely as they were previously:
10:4 / 50:5 = 2.5 / 10 = 1/4
(Put another way, bears are 4x as likely at night)
Let’s cover ratios a bit. A:B describes how much A we get for every B (imagine miles per gallon as the ratio miles:gallon). Compare values with division: going from 25:1 to 50:1 means you doubled your efficiency (50/25 = 2). Similarly, we just discovered how our “deers per bear” amount changed.
Og happily continues his research:
By the river, bears are 20x more likely (he saw 2 deer and 4 bears, so 2:4 / 50:5 = 1:20)
In winter, deer are 3x as likely (30 deer and 1 bear, 30:1 / 50:5 = 3:1)
He takes a scenario, compares it to the baseline, and computes the evidence adjustment.
Caveman Clarence subscribes to Og’s journal, and wants to apply the findings to his forest (where deer:bears are 25:1). Suppose Clarence hears an animal approaching:
His general estimate is 25:1 odds of deer:bear
It’s at night, with bears 4x as likely => 25:4
It’s by the river, with bears 20x as likely => 25:80
It’s in the winter, with deer 3x more likely => 75:80
Clarence guesses “bear” with near-even odds (75:80) and tiptoes out of there.
That’s Bayes. In fancy language:
Start with a prior probability, the general odds before evidence
Collect evidence, and determine how much it changes the odds
Compute the posterior probability, the odds after updating
Bayesian Spam Filter
Let’s build a spam filter based on Og’s Bayesian Bear Detector.
First, grab a collection of regular and spam email. Record how often a word appears in each:
(“hello” appears equally, but “buy” skews toward spam)
We compute odds just like before. Let’s assume incoming email has 9:1 chance of spam, and we see “hello darling”:
A generic message has 9:1 odds of spam:regular
Adjust for “hello” => keep the 9:1 odds (“hello” is equally-likely in both sets)
Adjust for “darling” => 9:5 odds (“darling” appears 5x as often in normal emails)
Final chances => 9:5 odds of spam
We’re learning towards spam (9:5 odds). However, it’s less spammy than our starting odds (9:1), so we let it through.
Now consider a message like “buy viagra”:
Prior belief: 9:1 chance of spam
Adjust for “buy”: 27:2 (3:2 adjustment towards spam)
Adjust for (“viagra”): …uh oh!
“Viagra” never appeared in a normal message. Is it a guarantee of spam?
Probably not: we should intelligently adjust for new evidence. Let’s assume there’s a regular email, somewhere, with that word, and make the “viagra” odds 3:1. Our chances become 27:2 * 3:1 = 81:2.
Now we’re geting somewhere! Our initial 9:1 guess shifts to 81:2. Now is it spam?
Well, how horrible is a false positive?
81:2 odds imply for every 81 spam messages like this, we’ll incorrectly block 2 normal emails. That ratio might be too painful. With more evidence (more words or other characteristics), we might wait for 1000:1 odds before calling a message spam.
Exploring Bayes Theorem
We can check our intuition by seeing if we naturally ask leading questions:
Is evidence truly independent? Are there links between animal behavior at night and in the winter, or words that appear together? Sure. We “naively” assume evidence is independent (and yet, in our bumbling, create effective filters anyway).
How much evidence is enough? Is seeing 2 deer & 1 bear the same 2:1 evidence adjustment as 200 deer and 100 bears?
How accurate were the starting odds in the first place? Prior beliefs change everything. (“A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.”)
Do absolute probabilities matter? We usually need the most-likely theory (“Deer or bear?”), not the global chance of this scenario (“What’s the probability of deers at night in the winter by the river vs. bears at night in the winter by the river?”). Many Bayesian calculations ignore the global probabilities, which cancel when dividing, and essentially use an odds-centric approach.
Can our filter be tricked? A spam message might add chunks of normal text to appear innocuous and “poison” the filter. You’ve probably seen this yourself.
What evidence should we use? Let the data speak. Email might have dozens of characteristics (time of day, message headers, country of origin, HTML tags…). Give every characteristic a likelihood factor and let Bayes sort ‘em out.
Thinking With Ratios and Percentages
The ratio and percentage approaches ask slightly different questions:
Ratios: Given the odds of each outcome, how does evidence adjust them?
The evidence adjustment just skews the initial odds, piece-by-piece.
Percentages: What is the chance of an outcome after supporting evidence is found?
In the percentage case,
“% Bears” is the overall chance of a bear appearing anywhere
“% Bears Going to River” is how likely a bear is to trigger the “river” data point
“% Bear at River” is the combined chance of having a bear, and it going to the river. In stats terms, P(event and evidence) = P(event) * P(event implies evidence) = P(event) * P(evidence|event). I see conditional probabilities as “Chances that X implies Y” not the twisted “Chances of Y, given X happened”.
9.6% of healthy people test positive, 80% of people with cancer do
If you see a positive result, what’s the chance of cancer?
Cancer:Healthy ratio is 1:99
Evidence adjustment: 80/100 : 9.6/100 = 80:9.6 (80% of sick people are “at the river”, and 9.6% of healthy people are).
Final odds: 1:99 * 80:9.6 = 80:950.4 (roughly 1:12 odds of cancer, ~7.7% chance)
The intuition: the initial 1:99 odds are pretty skewed. Even with a 8.3x (80:9.6) boost from a positive test result, cancer remains unlikely.
Cancer chance is 1%
Chance of true positive = 1% * 80% = .008
Chance of false positive = 99% * 9.6% = .09504
Chance of having cancer = .008 / (.008 + .09504) = 7.7%
When written with percentages, we start from absolute chances. There’s a global 0.8% chance of finding a sick patient with a positive result, and a global 9.504% chance of a healthy patient with a positive result. We then compute the chance these global percentages indicate something useful.
Let the approaches be complements: percentages for a bird’s-eye view, and ratios for seeing how individual odds are adjusted. We’ll save the myriad other interpretations for another day.
The Fourier Transform is one of deepest insights ever made. Unfortunately, the meaning is buried within dense equations:
Yikes. Rather than jumping into the symbols, let's experience the key idea firsthand. Here's a plain-English metaphor:
What does the Fourier Transform do? Given a smoothie, it finds the recipe.
How? Run the smoothie through filters to extract each ingredient.
Why? Recipes are easier to analyze, compare, and modify than the smoothie itself.
How do we get the smoothie back? Blend the ingredients.
Next, we'll refine the analogy into "math-English":
The Fourier Transform takes a time-based pattern, measures each "cycle ingredient" (cycle strength, offset, & rotation speed), and returns the overall "cycle recipe" (frequency graph)
Time for the equations? No! Let's get our hands dirty and experience cycles making patterns with live simulations.
If all goes well, we'll have an aha! moment and intuitively realize why the Fourier Transform is possible. We'll save the detailed math analysis for the follow-up.
This isn't a force-march through the equations, it's the casual stroll I wish I had. Onward!
From Smoothie to Recipe
A math transformation is a change of perspective. We change our notion of quantity from "single items" (lines in the sand, tally system) to "groups of 10" (decimal) depending on what we're counting. Scoring a game? Tally it up. Multiplying? Decimals, please.
The Fourier Transform changes our perspective from consumer to producer, turning What did I see? into How was it made?
In other words: given a smoothie, let's find the recipe.
Why? Well, recipes are great descriptions of drinks. You wouldn't share a drop-by-drop analysis, you'd say "I had an orange/banana smoothie". A recipe is more easily categorized, compared, and modified than the object itself.
So... given a smoothie, how do we find the recipe?
Well, imagine you had a few filters lying around:
Pour through the "banana" filter. 1 oz of bananas are extracted.
Pour through the "orange" filter. 2 oz of oranges.
Pour through the "milk" filter. 3 oz of milk.
Pour through the "water" filter. 3 oz of water.
We can reverse-engineer the recipe by filtering each ingredient. The catch?
Filters must be independent. The banana filter needs to capture bananas, and nothing else. Adding more oranges should never affect the banana reading.
Filters must be complete. We won't get the real recipe if we leave out a filter ("There were mangoes too!"). Our collection of filters must catch every last ingredient.
Ingredients must be combine-able. Smoothies can be separated and re-combined without issue (A cookie? Not so much. Who wants crumbs?). The ingredients, when separated and combined in any order, must behave the same.
Seeing The World As Cycles
The Fourier Transform takes a specific viewpoint: What if any signal could be filtered into a bunch of circular paths?
Whoa. This concept is mind-blowing, and poor Joseph Fourier had his idea rejected at first. (Really Joe, even a staircase pattern can be made from circles?)
And despite decades of debate in the math community, we expect students to internalize the idea without issue. Ugh. Let's walk through the intuition.
The Fourier Transform finds the recipe for a signal, like our smoothie process:
Start with a time-based signal
Apply filters to measure each possible "circular ingredient"
Collect the full recipe, listing the amount of each "circular ingredient"
Stop. Here's where most tutorials excitedly throw engineering applications at your face. Don't get scared; think of the examples as "Wow, we're finally seeing the source code (DNA) behind previously confusing ideas".
If earthquake vibrations can be separated into "ingredients" (vibrations of different speeds & strengths), buildings can be designed to avoid interacting with the strongest ones.
If sound waves can be separated into ingredients (bass and treble frequencies), we can boost the parts we care about, and hide the ones we don't. The crackle of random noise can be removed. Maybe similar "sound recipes" can be compared (music recognition services compare recipes, not the raw audio clips).
If computer data can be represented with oscillating patterns, perhaps the least-important ones can be ignored. This "lossy compression" can drastically shrink file sizes (and why JPEG and MP3 files are much smaller than raw .bmp or .wav files).
If a radio wave is our signal, we can use filters to listen to a particular channel. In the smoothie world, imagine each person paid attention to a different ingredient: Adam looks for apples, Bob looks for bananas, and Charlie gets cauliflower (sorry bud).
The Fourier Transform is useful in engineering, sure, but it's a metaphor about finding the root causes behind an observed effect.
Think With Circles, Not Just Sinusoids
One of my giant confusions was separating the definitions of "sinusoid" and "circle".
A "sinusoid" is a specific back-and-forth pattern (a sine or cosine wave), and 99% of the time, it refers to motion in one dimension
A "circle" is a round, 2d pattern you probably know. If you enjoy using 10-dollar words to describe 10-cent ideas, you might call a circular path a "complex sinusoid".
Labeling a circular path as a "complex sinusoid" is like describing a word as a "multi-letter". You zoomed into the wrong level of detail. Words are about concepts, not the letters they can be split into!
The Fourier Transform is about circular paths (not 1-d sinusoids) and Euler's formula is a clever way to generate one:
Must we use imaginary exponents to move in a circle? Nope. But it's convenient and compact. We can separate the path into real and imaginary parts, but don't forget the big picture: we move in circles.
Following Circular Paths
Let's say we're chatting on the phone and, like usual, I want us to draw the same circular path simultaneously (You promised!). What should I say?
How big is the circle? (Amplitude, i.e. size of radius)
How fast do we draw it? (Frequency. 1 circle/second is a frequency of 1 Hertz (Hz) or 2*pi radians/sec)
Where do we start? (Phase angle, where 0 degrees is the x-axis)
I could say "2-inch radius, start at 45 degrees, 1 circle per second, go!". After half a second we should be at the same spot: starting point + amount traveled = 45 + 180 = 225 degrees (on a 2-inch circle).
Every circular path needs a size, speed, and starting angle (amplitude/frequency/phase). We can even combine paths: imagine tiny motorcars, driving in circles at different speeds.
The combined position of all the cycles is our signal, just like the combined flavor of all the ingredients is our smoothie.
The magnitude of each cycle is listed in order, starting at 0Hz. Cycles [0 1] means
0 strength for the 0Hz cycle (0Hz = a constant cycle, stuck on the x-axis at zero degrees)
1 strength for the 1Hz cycle (completes 1 cycle per time interval)
Now the tricky part:
The blue graph measures the real part of the cycle. Another lovely math confusion: the real axis of the circle, which is usually horizontal, has its magnitude shown on the vertical axis. You can mentally rotate the circle 90 degrees if you like.
The time points are spaced at the fastest frequency. A 1Hz signal needs 2 time points for a start and stop (a single data point doesn't have a frequency). The time values [1 -1] shows the amplitude at these equally-spaced intervals.
With me? [0 1] is a pure 1Hz cycle.
Now let's add a 2Hz cycle to the mix. [0 1 1] means "Nothing at 0Hz, 1Hz of strength 1, 2Hz of strength 1":
Whoa. The little motorcars are getting wild: the green lines are the 1Hz and 2Hz cycles, and the blue line is the combined result. Try toggling the green checkbox to see the final result clearly. The combined "flavor" is a sway that starts at the max and dips low for the rest of the interval.
The yellow dots are when we actually measure the signal. With 3 cycles defined (0Hz, 1Hz, 2Hz), each dot is 1/3 of the way through the signal. In this case, cycles [0 1 1] generate the time values [2 -1 -1], which starts at the max (2) and dips low (-1).
Oh! We can't forget phase, the starting angle! Use magnitude:angle to set the phase. So [0 1:45] is 1Hz cycle that starts at 45 degrees:
This is a shifted version of [0 1]. On the time side we get [.7 -.7] instead of [1 -1], because our cycle isn't exactly lined up with our measuring intervals, which are still at the halfway point (this could be desired!).
The Fourier Transform finds the set of cycle speeds, strengths and phases to match any time signal.
Our signal becomes an abstract notion that we consider as "observations in the time domain" or "ingredients in the frequency domain".
Enough talk: try it out! In the simulator, type any time or cycle pattern you'd like to see. If it's time points, you'll get a collection of cycles (that combine into a "wave") that matches your desired points.
But… doesn't the combined wave have strange values between the yellow time intervals? Sure. But who's to say whether a signal travels in straight lines, or curves, or zips into other dimensions when we aren't measuring it? It behaves exactly as we need at the equally-spaced moments we asked for.
Making A Spike In Time
Can we make a spike in time, like (4 0 0 0), using cycles? (I'll use parens for time points)
Although the spike seems boring to us time-dwellers (that's it?), think about the complexity in the cycle world. Our cycle ingredients must start aligned (at the max value, 4) and then "explode outwards", each cycle with partners that cancel it in the future. Every remaining point is zero, which is a tricky balance with multiple cycles running around (we can't just "turn them off").
Let's walk through each time point:
At time 0, the first instant, every cycle ingredient is at its max. Ignoring the other time points, (4 ? ? ?) can be made from 4 cycles (0Hz 1Hz 2Hz 3Hz), each with a magnitude of 1 and phase of 0 (i.e., 1 + 1 + 1 + 1 = 4).
At every future point (t = 1, 2, 3), the sum of all cycles must cancel.
Here's the trick: when two cycles are on opposites sides of the circle (North & South, East & West, etc.) their combined position is zero (3 cycles can cancel if they're spread evenly at 0, 120, and 240 degrees).
Imagine a constellation of points moving around the circle. Here's the position of each cycle at every instant:
Notice how the the 3Hz cycle starts at 0, gets to position 3, then position "6" (with only 4 positions, 6 modulo 4 = 2), then position "9" (9 modulo 4 = 1).
When our cycle is 4 units long, cycle speeds a half-cycle apart (2 units) will either be lined up (difference of 0, 4, 8…) or on opposite sides (difference of 2, 6, 10…).
OK. Let's drill into each time point:
Time 0: All cycles at their max (total of 4)
Time 1: 1Hz and 3Hz cancel (positions 1 & 3 are opposites), 0Hz and 2Hz cancel as well. The net is 0.
Time 2: 0Hz and 2Hz line up at position 0, while 1Hz and 3Hz line up at position 2 (the opposite side). The total is still 0.
Time 3: 0Hz and 2Hz cancel. 1Hz and 3Hz cancel.
Time 4 (repeat of t=0): All cycles line up.
The trick is having individual speeds cancel (0Hz vs 2Hz, 1Hz vs 3Hz), or having the lined-up pairs cancel (0Hz + 2Hz vs 1Hz + 3Hz).
When every cycle has equal power and 0 phase, we start aligned and cancel afterwards. (I don't have a nice proof yet -- any takers? -- but you can see it yourself. Try [1 1], [1 1 1], [1 1 1 1] and notice the time spikes: (2 0), (3 0 0), (4 0 0 0)).
Here's how I visualize the initial alignment, followed by a net cancellation:
Moving The Time Spike
Not everything happens at t=0. Can we change our spike to (0 4 0 0)?
It seems the cycle ingredients should be similar to (4 0 0 0), but the cycles must align at t=1 (one second in the future). Here's where phase comes in.
Imagine a race with 4 runners. Normal races have everyone lined up at the starting line, the (4 0 0 0) time pattern. Boring.
What if we want everyone to finish at the same time? Easy. Just move people forward or backwards by the appropriate distance. Maybe granny can start 2 feet in front of the finish line, Usain Bolt can start 100m back, and they can cross the tape holding hands.
Phase shifts, the starting angle, are delays in the cycle universe. Here's how we adjust the starting position to delay every cycle 1 second:
A 0Hz cycle doesn't move, so it's already aligned
A 1Hz cycle goes 1 revolution in the entire 4 seconds, so a 1-second delay is a quarter-turn. Phase shift it 90 degrees backwards (-90) and it gets to phase=0, the max value, at t=1.
A 2Hz cycle is twice as fast, so give it twice the angle to cover (-180 or 180 phase shift -- it's across the circle, either way).
A 3Hz cycle is 3x as fast, so give it 3x the distance to move (-270 or +90 phase shift)
If time points (4 0 0 0) are made from cycles [1 1 1 1], then time points (0 4 0 0) are made from [1 1:-90 1:180 1:90]. (Note: I'm using "1Hz", but I mean "1 cycle over the entire time period").
Whoa -- we're working out the cycles in our head!
The interference visualization is similar, except the alignment is at t=1.
Test your intuition: Can you make (0 0 4 0), i.e. a 2-second delay? 0Hz has no phase. 1Hz has 180 degrees, 2Hz has 360 (aka 0), and 3Hz has 540 (aka 180), so it's [1 1:180 1 1:180].
Discovering The Full Transform
The big insight: our signal is just a bunch of time spikes! If we merge the recipes for each time spike, we should get the recipe for the full signal.
The Fourier Transform builds the recipe frequency-by-frequency:
Separate the full signal (a b c d) into "time spikes": (a 0 0 0) (0 b 0 0) (0 0 c 0) (0 0 0 d)
For any frequency (like 2Hz), the tentative recipe is "a/4 + b/4 + c/4 + d/4" (the strength of each spike is split among all frequencies)
Wait! We need to offset each spike with a phase delay (the angle for a "1 second delay" depends on the frequency).
Actual recipe for a frequency = a/4 (no offset) + b/4 (1 second offset) + c/4 (2 second offset) + d/4 (3 second offset).
We can then loop through every frequency to get the full transform.
Here's the conversion from "math English" to full math:
A few notes:
N = number of time samples we have
n = current sample we're considering (0 .. N-1)
xn = value of the signal at time n
k = current frequency we're considering (0 Hertz up to N-1 Hertz)
Xk = amount of frequency k in the signal (amplitude and phase, a complex number)
The 1/N factor is usually moved to the reverse transform (going from frequencies back to time). This is allowed, though I prefer 1/N in the forward transform since it gives the actual sizes for the time spikes. You can get wild and even use 1/sqrt(N) on both transforms (going forward and back still has the 1/N factor).
n/N is the percent of the time we've gone through. 2 * pi * k is our speed in radians / sec. e^-ix is our backwards-moving circular path. The combination is how far we've moved, for this speed and time.
The raw equations for the Fourier Transform just say "add the complex numbers". Many programming languages cannot handle complex numbers directly, so you convert everything to rectangular coordinates and add those.
This was my most challenging article yet. The Fourier Transform has several flavors (discrete/continuous/finite/infinite), covers deep math (Dirac delta functions), and it's easy to get lost in details. I was constantly bumping into the edge of my knowledge.
But there's always simple analogies out there -- I refuse to think otherwise. Whether it's a smoothie or Usain Bolt & Granny crossing the finish line, take a simple understanding and refine it. The analogy is flawed, and that's ok: it's a raft to use, and leave behind once we cross the river.
I realized how feeble my own understanding was when I couldn't work out the transform of (1 0 0 0) in my head. For me, it was like saying I "knew" addition but, gee whiz, I'm not sure what "1 + 1 + 1 + 1" would be. Why not? Shouldn't we have an intuition for the simplest of operations?
That discomfort led me around the web to build my intuition. In addition to the references in the article, I'd like to thank:
Imagine spinning your signal in a centrifuge and checking for a bias. I have a correction: we must spin backwards (the exponent in the equation above needs a negative sign). You already know why: we need a phase delay so spikes appear in the future.
The Fourier Transform is about cycles added to cycles added to cycles. Try making a "time spike" by setting a strength of 1 for every component (press Enter after inputting each number). Fun fact: with enough terms, you can draw any shape, even Homer Simpson.