Intuitive Guide to Hyperbolic Functions

If the exponential function $e^x$ is water, the hyperbolic functions ($\cosh$ and $\sinh$) are hydrogen and oxygen. They're the technical, rarely-discussed parts that combine into a famous whole.

Admittedly, the hyperbolic functions were tucked into a dark part of my attic. They were defined with strained motivations ("Need yet another way to build a hyperbola?") then crammed into tables of integrals, soon to be forgotten. I couldn't think with them.

After much struggle, I found their purpose:

What are the hyperbolic functions ($\cosh$ and $\sinh$)? The even/odd parts of the exponential function ($e^x$) that, funny enough, can build a hyperbola.
Why are parts of the exponential called hyperbolic? That's the modern name. These functions are so darn good at making hyperbolas that they're typecast for that role. (Similarly, sine isn't just about circles, and we shouldn't name it "circular sine"!)
Why are hyperbolic functions useful? A better framing is: Why are parts of $e^x$ useful? We now have "mini logarithms" and "mini exponentials", with partial versions of $e$'s famous properties.
I can handle it: how do hyperbolas connect to exponentials? Hyperbolas come from inversions ($xy = 1$ or $y = \frac{1}{x}$). The area under an inversion grows logarithmically, and the corresponding coordinates grow exponentially. If we rotate the hyperbola, we rotate the formula to $(x-y)(x+y) = x^2 - y^2 = 1$. The area/coordinates now follow modified logarithms/exponentials: the hyperbolic functions.
Actually, I couldn't handle it. That's ok. We'll build up to it. These functions took many years to be discovered, and their behavior is hardly obvious.

This post is fairly technical: we're studying hydrogen, not water. If hyperbolic functions appear in class, you don't have much choice, and may as well get an intuition. If you're studying for fun, don't sweat the details, that's what calculus students are for.

As a prerequisite, have these insights in mind:

$e^x$ is the process of continuous, 100% growth
natural log is the time for $e^x$ to grow to a given value

Let's dive in.

Part 1: Exponential Viewpoint

Background: Odd/Even Functions

Chemical compounds can be separated into constituent atoms; math objects are similar.

The number 13 can be split into an even part (12) and odd part (1). They combine to the whole: 13 = 12 + 1.

This even/odd split works for functions, too:

$\displaystyle{f(x) = f_\text{even}(x) + f_\text{odd}(x)}$

Functions are tricky to separate because they have multiple values. The separation we look for is between the future values ($x > 0$) and the past ones ($x < 0$).

To see what the future and past have in common, take their average:

$\displaystyle{f_\text{even}(x) = \text{avg}[\text{future} + \text{past}] = \frac{f(x) + f(-x)}{2}}$

To see how the future and past differ, average their gap:

$\displaystyle{f_\text{odd}(x) = \text{avg}[\text{future} - \text{past}] = \frac{f(x) - f(-x)}{2}}$

Can we combine these parts to get the original?

$f_\text{even}(x) + f_\text{odd}(x) = \frac{f(x) + f(-x)}{2} + \frac{f(x) - f(-x)}{2} = \frac{f(x) + f(-x) + f(x) - f(-x)}{2} = \frac{2f(x)}{2} = f(x)$

Neat trick. We can split any pattern into its even and odd parts.

Taylor series

The Taylor Series (Math DNA) expresses a function as a polynomial:

$\displaystyle{f(x) = c_0 + c_1 x + c_2 x^2 + c_3x^3 + \cdots}$

The even exponents ($x^0, x^2, ...$) are symmetric in the past and future (for example: $x^2 = (-x)^2$), and the odd exponents are anti-symmetric ($x^3 = - (-x)^3$).

We can quickly extract the even/odd parts by separating the function's Taylor series into even/odd exponents:

$\begin{align*} f(x) &= f_\text{even}(x) + f_\text{odd}(x) \\ f_\text{even}(x) &= c_0 + c_2 x^2 + c_4x^4 + \cdots \\ f_\text{odd}(x) &= c_1 x + c_3x^3 + c_5x^5 \cdots \end{align*}$

Even and Odd Exponentials

Ok, we have our trick. Why not try to split up the famous exponential function?

$\begin{align*} e^x_{\text{even}} &= \text{avg}[e^x, e^{-x}] = \frac{e^x + e^{-x}}{2} = \cosh(x) \\ e^x_\text{odd} &= \text{avg}[e^x, -e^{-x}] = \frac{e^x - e^{-x}}{2} = \sinh(x) \\ e^x &= e^x_{\text{even}} + e^x_{\text{odd}} = \cosh(x) + \sinh(x) \end{align*}$

Instead of the awkward $e^x_{\text{even}}$ and $e^x_{\text{odd}}$, we call the even part $\cosh$ (hyperbolic cosine) and the odd part $\sinh$ (hyperbolic sine). (The pronunciation varies.)

Now, why the adjective "hyperbolic"? Euler used the quantity $(e^{x} + e^{-x})$ without giving it a special name. Lambert later called them "transcendental logarithmic functions", and even later the ability to build hyperbolas was seen. That use case has stuck.

Call me old-fashioned, but parts of $e^x$ are interesting for that reason alone. I don't need a hyperbola to justify their utility.

Part 2: What do hyperbolic functions look like?

Let's graph $\cosh$ and $\sinh$ with their parent:

$e^x$ is our standard exponential, with Taylor series: $e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + ...$
$\cosh(x)$ is a gentle bowl. It's roughly parabolic, then expands exponentially, with Taylor series: $\cosh(x) = 1 + \frac{x^2}{2!} + \frac{x^4}{4!} + ...$
$\sinh(x)$ looks linear at first, then grows exponentially also: $\sinh(x) = x + \frac{x^3}{3!}...$

Now for a usually-confusing question: What does the parameter $x$ in $\cosh(x)$ really mean?

It's nothing special! It's the same $x$ we feed to any exponential, usually the time to grow: $e^x = e^{rt} = e^{100\% t}$.

We can see the hyperbolic trig functions as:

$\cosh(x)$: What is the even part of $e^x$ after $x$ units of time?
$\sinh(x)$: What is the odd part of $e^x$ after $x$ units of time?

A few gut checks:

What's $\cosh(0)$? At $x=0$ (aka $\text{time} = 0$), we haven't moved from 1.0 (the exponential starting point). The average is 1, and there's no separation, so $\sinh(0) = 0$.
As time goes on, the future grows and the past ($e^{-x}$) vanishes. The average value becomes $\frac{e^x + 0}{2}$, and the average difference becomes $\frac{e^x - 0}{2}$. For large $x$, we'd expect $\cosh(x) \sim \sinh(x) \sim 0.5 e^x$.

Now, how about the inverse hyperbolic functions like $\text{acosh}$ (also called $\text{arccosh}$ or $\text{arcosh}$)? "Inverse hyperbolic cosine" sounds scary, but think of it like this:

$\ln(a)$: How long until $e^x$ reaches value $a$?
$\text{acosh(a)}$: How long until the even part of $e^x$ reaches $a$? This must take longer than $\ln(x)$, since we're only using the even powers in our exponential growth Taylor series.
$\text{asinh(a)}$: How long until the odd part of $e^x$ reaches $a$? Similarly, this must require more time $\ln(x)$.

In the graphs above, $\text{acosh}$ and $\text{asinh}$ require more time (i.e., are above) the natural log.

Intuitively, I see the various hyperbolic functions as modified exponentials and logarithms:

$\cosh$ and $\sinh$ are partial/delayed exponential curves (with new behavior near zero, where the past still has influence).
$\text{acosh}$ and $\text{asinh}$ are slower versions of the natural log. It takes $\ln(3) = 1.09$ units for $e^x$ to grow from 1 to 3, but it takes $\text{acosh}(3) = 1.76$ units for just the even part of $e^x$ to do the same.

Part 3: Applications of the Exponential Interpretation

Since $\cosh$ and $\sinh$ are mini exponentials (those little cherubs!), we'd guess they still have interesting properties.

Function equal to its own second derivative

$e^x$ is the function equal to its own derivative ($f' = f$).

$\displaystyle{e^x \xrightarrow{\text{d/dx}} e^x \xrightarrow{\text{d/dx}} e^x \dots }$

Ok. And Sine and cosine are functions equal to their own fourth derivatives ($f'''' = f$).

$\displaystyle{\sin(x) \xrightarrow{\text{d/dx}} \cos(x) \xrightarrow{\text{d/dx}} -\sin(x) \xrightarrow{\text{d/dx}} -\cos(x) \xrightarrow{\text{d/dx}} \sin(x) }$

Repeating after one derivative, repeating after four derivatves... that's a big gap. Anything in-between?

You bet. The hyperbolic functions equal their own second derivative ($f'' = f$):

$\displaystyle{\frac{d}{dx} \sinh(x) = \frac{d}{dx} \frac{e^x - e^{-x}}{2} = \frac{e^x + e^{-x}}{2} = \cosh(x)}$

$\displaystyle{\frac{d}{dx} \cosh(x) = \frac{d}{dx} \frac{e^x + e^{-x}}{2} = \frac{e^x - e^{-x}}{2} = \sinh(x)}$

Unlike $\sin$ and $\cos$, there's no awkward negative signs in the cycle, just a toggle:

$\displaystyle{\sinh(x) \xrightarrow{\text{d/dx}} \cosh(x) \xrightarrow{\text{d/dx}} \sinh(x) \xrightarrow{\text{d/dx}} \cosh(x) \xrightarrow{\text{d/dx}} \dots }$

Neat. The hyperbolic functions are like "half exponentials" because it takes two derivatives to complete the cycle. This is why they're useful in calculus -- not because we care about the coordinates on a hyperbola!

Understanding Integral Tables

You'll see hyperbolic functions in tables of tricky integrals and derivatives:

Ignore the specifics. Let's see the general pattern without getting lost in the details.

We have our standard relationship:

$\displaystyle{\frac{d}{dx} \ln(x) = \frac{1}{x}}$

Since the hyperbolic functions are variations of the exponentials, we'd expect $\frac{d}{dx}\text{asinh}$ to resemble $\frac{1}{x}$. From the table:

$\displaystyle{\frac{d}{dx} \text{asinh}(x) = \frac{1}{\sqrt{x^2 + 1}}}$

When $x$ is large, the "$+ 1$" doesn't matter much, and the derivative becomes:

$\displaystyle{\frac{d}{dx} \text{asinh}(x) = \frac{1}{\sqrt{x^2 + 1}} \sim \frac{1}{\sqrt{x^2 }} \sim \frac{1}{x}}$

Ah! The function $\text{asinh}(x)$ behaves like $\ln(x)$, except for small $x$ where the $+1$ term still matters. Without this exponential perspective, I'd have no clue what the derivative of $\text{asinh}$ should resemble. (You want the rate of change of the inverse function that determines the y-coordinate a in a hyperbola? What?)

New descriptions of Sine and Cosine

Remember, we can now split the exponential function whenever we want:

$\displaystyle{e^x = \cosh(x) + \sinh(x)}$

So, what if we plug in $ix$? We'd get

$\displaystyle{e^{ix} = \cosh(ix) + \sinh(ix)}$

But Euler's formula says:

$\displaystyle{e^{ix} = \cos(x) + i\sin(x)}$

Whoa. The even/odds parts of each function must be the same, so:

$\displaystyle{\cosh(ix) = \cos(x)}$

$\displaystyle{\sinh(ix) = i \sin(x)}$

If we feed the imaginary axis to our everyday exponential function, we get the trig functions which live on a circle. The rotation is happening via the parameter $ix$, vs. in the function $e^{ix}$. (Instead of rotating the steering wheel, we're rotating the engine, so to speak.)

In sine's case, we have an awkward imaginary term to shuffle around:

$\displaystyle{\sin(x) = \frac{\sinh(ix)}{i} = -i \sinh(ix)}$

These connections are more useful in complex analysis (still learning...), normally you'd prefer to pass real parameters into functions.

New Trig Identities

A few new identities doesn't hurt. With some quick algebra, we can turn regular trig identities into their hyperbolic versions. Starting with:

$\displaystyle{\cos^2(x) + \sin^2(x) = 1}$

We swap in $\cos(x) = \cosh(ix)$ and $\sin(x) = -i \sinh(ix)$ to get:

$\begin{align*} \cosh^2(ix) + [-i\sinh(ix)]^2 &= 1 \\ \cosh^2(ix) + i^2\sinh^2(ix) &= 1 \\ \cosh^2(ix) -\sinh^2(ix) &= 1 \end{align*}$

Now, the term $ix$ is just a parameter. To simplify things, just say $z = ix$ and write:

$\displaystyle{\cosh(z)^2 -\sinh(z)^2 = 1}$

We can leave out the specific parameter and write:

$\displaystyle{\cosh^2 -\sinh^2 = 1}$

This pattern works in general. To convert a regular trig formula to its hyperbolic equivalent (Osborne's Rule), swap:

$\cos^2 \Rightarrow \cosh^2$
$\sin^2 \Rightarrow -\sinh^2$ (due to the $i$ when converting $\sin$ into $\sinh$)

Machine Learning Functions

While looking for applications of $\cosh$, I ran across this function, used in Machine Learning:

$\displaystyle{\text{logcosh(x)} = \ln(\cosh(x))}$

A direct interpretation is confusing: "Take the natural log of the x-coordinate in a hyperbola". Huh? What does that represent?

The exponential perspective makes it simpler:

We don't want to take the natural log of the regular exponential, since $\ln(e^x) = x$. A function like $x$ (or rather, the absolute value $|x|$) is too pointy, and doesn't have a clean derivative at zero.
However, $\cosh(x)$ only resembles the regular exponential function. Its natural log will only resemble $f(x) = x$. The resulting function $\ln(\cosh(x))$ looks parabolic near zero, and linear as we grow:

How does this work? For small $x$:

$\cosh(x) \approx 1 + \frac{x^2}{2}$. These are the first few terms of its Taylor series.
$\ln(1 + x) \approx x$. The time to grow from 1.0 to 1 + .01 at 100% interest is only .01 units (not enough time for compounding)
Combining the approximations, we get: $\ln(\cosh(x)) \approx \ln(1 + \frac{x^2}{2}) \approx \frac{x^2}{2}$.

As $x$ increases, we approach an offset line:

$\displaystyle{\text{logcosh(x)} = \ln(\cosh(x)) \approx \ln(\frac{e^x + 0}{2}) = \ln(e^x) - \ln(2) = x - \ln(2)}$

Cool! We have a parabola-line hybrid. No need for hyperbolas, we're dealing with variations of the exponential function.

Bonus: Lining Things Up

Maybe we can get $\text{logcosh}$ to turn into the line $y = x$, instead of the offset line $y = x - \ln(2)$. We can add a term to undo the $-\ln(2)$:

$\displaystyle{y\ =\ln\left(\cosh\left(x\right)\right)+\ln\left(2\right)\left(1-e^{-0.1x^{2}}\right)}$

Notes:

Our new function (purple) still looks like a parabola for small $x$.
The $x^2$ in the in $e^{-0.1x^2}$ makes the function symmetric.
Adjusting the constant $0.1$ changes how fast we approach the line $y=x$.

A short while back, I'd have no idea how to make a parabola gently transition into a line. But seeing $\cosh$ as "parabolic short term, exponential long term" gives us a clue: use the natural log to undo the "exponential long term" behavior, giving us "parabolic short term, linear long term".

Hyperbolic Tangent

The hyperbolic tangent is used as a machine learning activation function:

$\displaystyle{\tanh(x) = \frac{\sinh(x)}{\cosh(x)} = \frac{\text{odd part of } e^x}{\text{even part of } e^x}}$

What's its meaning? It's the ratio between the odd and even parts of the exponential function after a given amount of time.

At $x=0$, the even part dominates (full symmetry between past and future)
As $x$ grows, the odd part catches up, and the ratio approaches 1 (equal parts symmetry and anti-symmetry)

The function $\tanh$ is nicely centered at 0 and smoothly varies between -1 and 1. As $x$ increases, $\cosh(x) \sim \sinh(x) \sim 0.5e^x$ and $\frac{0.5e^x}{0.5e^x} = 1$.

Part 4: Geometric Viewpoint

Ok. We've gone pretty far without talking about hyperbolas. Why?

Well, just look at it. They aren't an everyday shape. What's their purpose? Catenaries, orbital mechanics? That's an application but not the reason they exist.

Time to build an intuition.

The Inverse Function Hyperbola (xy=1)

Hyperbolas are built from inverse functions, like $y = \frac{1}{x}$. It's a simple, useful relationship: what's the inverse of $x=2$? $y=1/2$. Multiply them and we undo all scaling: $xy=1$.

A burning math question might be: sure, we have this inverse relationship, but what's the area underneath?

In calculus terms, this is:

$\displaystyle{\text{area} = \int_{1}^{x} \frac{1}{x} dx}$

Where

$\int_1^x$ specifies the boundaries. We start at $x=1$ and go to some upper value for $x$. (We can't start trapping area at $x=0$, since it shoots up to infinity.)
$\frac{1}{x} dx$ is the area we collect at each x-coordinate as we march along

This integral is hard. Thankfully, one definition of the natural log is:

$\displaystyle{\ln(x) = \int \frac{1}{x}}$

so we have a ready-made solution. Starting from $x=1$, what upper x-coordinate will trap 5 units of area? We want to solve:

$\displaystyle{5 = \int_{1}^{x} \frac{1}{x} dx = \ln(x)}$

Which means

$\begin{align*} 5 &= \ln(x) \\ e^5 &= e^{\ln(x)} \\ 148.41 &= x \end{align*}$

Wow! That's a large coordinate to trap 5 measly units of area.

In general, we have

$\displaystyle{e^{\text{area}} = \text{x-coordinate}}$

And the y-coordinate (inverse) at that position is $y = \frac{1}{x} = \frac{1}{e^{\text{area}}} = e^{-\text{area}}$.

The Rotated Hyperbola

There's multiple ways to make a hyperbolic curve. If we rotate 45 degrees, we get something like this:

How do we rotate the equation $xy=1$? The standard way is with a rotation matrix, but let's do the rotation with complex numbers.

Let's treat points as complex numbers: $(x, y) \rightarrow x + yi$. A sample point $(a + bi)$ is on the rotated hyperbola if, after undoing the rotation, we see our original $xy = 1$ relationship.

Candidate point: ($a + bi$)
45-degree counterclockwise rotation: $\frac{(1 + i)}{\sqrt{2}}$

$\displaystyle{\text{original point} = \text{candidiate} \cdot \text{rotation} }$

$\displaystyle{= (a + bi)\frac{(1 + i)}{\sqrt{2}} = \frac{1}{\sqrt{2}}[(a + bi^2) + (ai + bi)] = \frac{(a - b)}{\sqrt{2}} + \frac{(a + b)}{\sqrt{2}}i}$

Ok, we turned our candidate back to its original (pre-rotation) point, which is at

$\displaystyle{(x, y) = \left( \frac{(a - b)}{\sqrt{2}}, \frac{(a + b)}{\sqrt{2}} \right) }$

When does it have our $xy=1$ relationship?

$\begin{align*} \frac{(a-b)}{\sqrt{2}}\frac{(a+b)}{\sqrt{2}} &= 1 \\ (a - b)(a + b) &= 2 \\ a^2 - b^2 &= 2 \end{align*}$

Ok! Our candidate is on the rotated hyperbola if: $a^2 - b^2 = 2$ or $\text{real magnitude}^2 - \text{imaginary magnitude}^2 = 2$. We can express this requirement in $(x, y)$ notation as:

$\displaystyle{x^2 - y^2 = 2}$

Almost there. Our original hyperbola ($xy = 1$) contains the point $(1, 1)$, which is a distance of $\sqrt{x^2 + y^2} = \sqrt{1^2 + 1^2} = \sqrt{2}$ from the origin. The constraint equation is really $x^2 - y^2 = r^2$.

If we set the radius to 1, we get a formula for the unit hyperbola:

$\displaystyle{x^2 - y^2 = 1}$

Tada! A nice, clean equation, but I also see it as $(x - y)(x + y) = 1$, which hints at the inverse relationship.

Ok. Time for the scary diagram you'll see in most textbooks:

We have our rotated hyperbola, and want to trap $a/2$ units of area in red. (The full $a$ units would include the area under the x-axis. What functions determine the coordinates that trap this area?

The solution turns out to be the even and odd parts of the exponential, $\cosh$ and $\sinh$. There's a 1950's pamphlet "Hyperbolic Functions" by V. G. Shervatot, that goes through the derivation. The key intuition is realizing that hyperbolas (generally speaking) trap area logarithmically, so the necessary coordinates grow exponentially:

$xy = 1$ hyperbola (trap area under curve)
- $\text{x-coordinate} = e^{\text{area}}$
- $\text{y-coordinate} = \frac{1}{e^{\text{area}}} =e^{-\text{area}} $
$x^2 - y^2 = 1$ hyperbola (trap $\frac{\text{area}}{2}$ per diagram)
- $\text{x-coordinate} = \frac{e^{\text{area}/2} + e^{-\text{area}/2}}{2} = \cosh(x)$
- $\text{y-coordinate} = \frac{e^{\text{area}/2} - e^{-\text{area}/2}}{2} = \sinh(x)$

In case it needs to be said: it's not obvious that the even/odd parts of the exponential function determine the coordinates that trap area in a rotated hyperbola.

(Aside: Hyperbolas can be defined in terms of distance to fixed points or a conic section, but this gives no intuition for why exponentials are involved.)

The Secant/Tangent hyperbola

One more confusion is why we need new functions to parameterize the hyperbola, when existing trig functions do the trick:

Start with a circle
For any angle $x$, we have coordinates $(x, y)$ = $(\cos(x), \sin(x))$
If we invert the x coordinate (hey, it's what hyperbolas do), we get $x = \frac{1}{\cos(x)} = \sec(x)$
If we scale the y coordinate by that same inversion, we get $y = \sin(x) \cdot \frac{1}{\cos(x)} = \tan(x)$

Does this really make a hyperbola? It meets the requirements:

$x^2 - y^2 = 1$ (relationship needed for rotated hyperbola)
$\sec^2(t) - \tan^2(t) = 1$ (rearrangement of the famous $\sec^2(t) = 1 + \tan^2(t)$)

This video shows how the various parameterizations behave (open the calculator):

Our familiar trig functions ($\sec(t), \tan(t))$ trace the same hyperbola as the fancy new $(\cosh(t), \sinh(t))$. They just go a different speed. And the parameter $t$ is just an everyday angle we plug into trig functions.

Although there are multiple parameterizations for the hyperbola, $\cosh$ and $\sinh$ are defined with exponentials and are the analog of $\sin$ and $\cos$ in Euler's Formula. They can wear the Official Hyperbolic Parameterization crown.

Revisiting Inverse Functions

The inverse hyperbolic functions go from coordinates back to area. Let's say I'm on the unit hyperbola, with an x-coordinate of 5. How much area have I trapped? $\text{acosh}(5) = 2.29$.

The inverse functions are sometimes called $\text{arcosh}$ ("area hyperbolic cosine"). This forces us to think about the coordinate-to-area conversion. I prefer to think about exponentials, and use $\text{acosh}$ ("inverse hyperbolic cosine"). Area is one interpretation, don't force me into it.

We can derive the formulas for the inverse functions by solving $x = \cosh(y)$ and $x = \sinh(y)$:

$\displaystyle{\text{acosh(x)} = \ln(x + \sqrt{x^2 - 1})}$

$\displaystyle{\text{asinh(x)} = \ln(x + \sqrt{x^2 + 1})}$

As expected, these look like modified logarithms. As $x$ grows, they approach $\ln(x + x) = \ln(2x) = \ln(x) + \ln(2)$, or the natural logarithm with an offset.

Part 5: Applications of the Geometric Interpretation

The Catenary

The main application of the geometric view is that the $\cosh(x)$ is the shape a rope takes when hanging between two fixed points. It's not quite a parabola, it's a catenary curve, with the St. Louis Arch as a famous example. Here's a few more curves (source) that follow $\cosh(x)$:

The process to build this curve is fairly subtle:

First, create a rotated hyperbola with $x^2 - y^2 = 1$
Instead of using the hyperbola, make a graph of just the x-coordinate.
This graph of just the x-coordinate makes a new curve, which models how the rope hangs

This convoluted process isn't how $\cosh$ was discovered. There's a differential equation that models the forces inside a hanging rope:

$\displaystyle{\frac{dy}{dx} = \frac{s}{a}}$

To solve the differential equation, we need the convenient exponential properties of $\cosh$, and wind up with:

$\displaystyle{y = a \cosh(\frac{x}{a})}$

It's cute that $\cosh$ parameterizes a hyperbola, but that interpretation has nothing to do with why it's the solution. I think "the catenary follows the even part of the exponential function" not "the catenary follows the x-coordinate of the hyperbola".

Shape Where Arc Length = Area

The area under the exponential $e^x$ equals the current value (plus a constant). Consider the region from $x=0$ to $x=2$:

Area under curve: $\int_0^2 e^x = e^2 - e^0 = 7.389 - 1 = 6.389$
Current value: $e^2 = 7.389$
Pattern: $\text{current value} = \text{area under curve} - 1$

A pretty clean connection, right? (Don't forget that $+C$)

Now how about $\cosh$?

Current value: $\cosh(x)$
Area under curve: $\int \cosh(x) = \sinh(x)$
Arc length of curve: $\int \sqrt{1 + (\cosh'(x))^2} = \int \sqrt{1 + \sinh^2(x)} = \int \cosh(x) = \sinh(x)$
Pattern: $\text{area under curve} = \text{arc length} = \sqrt{\text{current value}^2 - 1}$

The current value of $\cosh$ can be swapped in using the identity $\sqrt{\cosh^2 - 1} = \sinh$.

For large $x$, the $-1$ is negligible and $\sqrt{\text{current value}^2 - 1} \sim \text{current value}$. So, for large $x$, we get equality between area, arc length, and current value (imagine the green rope hanging down and just touching the x-axis). It's more connected than regular $e^x$, not bad!

(Intuition for another day: Math deals with unitless quantities. $13 \ \text{cm}$ is not directly comparable with $13 \ \text{cm}^2$. Yet in math class, we can solve $x = 1 + x^2$ and nobody cares that constant, linear and squared terms are used in conjunction.)

Hyperbolic Geometry

The shape of the universe may be a hyperbola, and hyperbolic geometry is used in special relativity (beyond my pay grade). If we do live in a giant hyperbola, I, uh, may be forced to recant my "exponentials first" stance.

Summary

The hyperbolic functions can be seen as exponential functions (relating time and growth) or geometric functions (relating area and coordinates). Hyperbolas, generally speaking, have logarithmic area and exponential coordinates.

It's been a long journey, but these functions don't haunt my attic any more.

Happy math.

References

Math

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Intuitive Guide to Convolution

Like making engineering students squirm? Have them explain convolution and (if you're barbarous) the convolution theorem. They'll mutter something about sliding windows as they try to escape through one.

Convolution is usually introduced with its formal definition:

$\displaystyle{ (f * g )(t) = \int_{-\infty}^\infty f(\tau) g(t - \tau) d\tau }$

Yikes. Let's start without calculus: Convolution is fancy multiplication.

Part 1: Hospital Analogy

Imagine you manage a hospital treating patients with a single disease. You have:

A treatment plan: [3] Every patient gets 3 units of the cure on their first day.
A list of patients: [1 2 3 4 5] Your patient count for the week (1 person Monday, 2 people on Tuesday, etc.).

Question: How much medicine do you use each day? Well, that's just a quick multiplication:

Plan  *  Patients       = Daily Usage
[3]   *  [1 2 3 4 5]    = [3 6 9 12 15]

Multiplying the plan by the patient list gives the usage for the upcoming days: [3 6 9 12 15]. Everyday multiplication (3 x 4) means using the plan with a single day of patients: [3] * [4] = [12].

Intuition For Convolution

Let's say the disease mutates and requires a multi-day treatment. You create a new plan: Plan: [3 2 1]

That means 3 units of the cure on the first day, 2 on the second, and 1 on the third. Ok. Given the same patient schedule [1 2 3 4 5], what's our medicine usage each day?

Uh... shoot. It's not a quick multiplication:

On Monday, 1 patient comes in. It's her first day, so she gets 3 units.
On Tuesday, the Monday gal gets 2 units (her second day), but two new patients arrive, who get 3 each (2 * 3 = 6). The total is 2 + (2 * 3) = 8 units.
On Wednesday, it's trickier: The Monday gal finishes (1 unit, her last day), the Tuesday people get 2 units (2 * 2), and there are 3 new Wednesday people... argh.

The patients are overlapping and it's hard to track. How can we organize this calculation?

An idea: imagine flipping the patient list, so the first patient is on the right:

           Start of line
 5 4 3 2 1

Next, imagine we have 3 separate rooms where we apply the proper dose:

 Rooms     3 2 1

On your first day, you walk into the first room and get 3 units of medicine. The next day, you walk into room #2 and get 2 units. On the last day, you walk into room #3 and get 1 unit. There's no rooms afterwards, and your treatment is done.

To calculate the total medicine usage, line up the patients and walk them through the rooms:

 Monday
 ----------------------------
 Rooms                  3 2 1                                      
 Patients       5 4 3 2 1

 Usage                  3

On Monday (our first day), we have a single patient in the first room. She gets 3 units, for a total usage of 3. Makes sense, right?

On Tuesday, everyone takes a step forward:

 Tuesday
 ----------------------------
 Rooms                  3 2 1              
 Patients ->      5 4 3 2 1

 Usage                  6 2      = 8

The first patient is now in the second room, and there's 2 new patients in the first room. We multiply each room's dose by the patient count, then combine.

Every day we just walk the list forward:

 Wednesday
 ----------------------------
 Rooms                  3 2 1              
 Patients ->        5 4 3 2 1

 Usage                  9 4 1    = 14


 Thursday
 -----------------------------
 Rooms                  3 2 1              
 Patients ->          5 4 3 2 1

 Usage                 12 6 2    = 20


 Friday
 -----------------------------
 Rooms                  3 2 1              
 Patients ->            5 4 3 2 1

 Usage                 15 8 3    = 26

Whoa! It's intricate, but we figured it out, right? We can find the usage for any day by reversing the list, sliding it to the desired day, and combining the doses.

The total day-by-day usage looks like this (don't forget Sat and Sun, since some patients began on Friday):

Plan      *  Patient List   = Total Daily Usage

[3 2 1]   *  [1 2 3 4 5]    = [3 8 14 20 26 14 5]
              M T W T F        M T W  T  F  S  S

This calculation is the convolution of the plan and patient list. It's a fancy multiplication between a list of input numbers and a "program".

Interactive Demo

Here's a live demo. Try changing F (the plan) or G (the patient list). The convolution $c(t)$ matches our manual calculation above.

(We define functions $f(x)$ and $g(x)$ to pad each list with zero, and adjust for the list index starting at 1.)

You can do a quick convolution with Wolfram Alpha:

ListConvolve[{3, 2, 1}, {1, 2, 3, 4, 5}, {1, -1}, 0]
{3, 8, 14, 20, 26, 14, 5}

(The extra {1, -1}, 0 aligns the lists and pads with zero.)

Application: COVID Ventilator Usage

I started this article 5 years ago (intuition takes a while...), but unfortunately the analogy is relevant today.

Let's use convolution to estimate ventilator usage for incoming patients.

Set $f(x)$ as the percent of patients needing ventilators. For example, [.05 .03 .01] means 5% of patients need ventilators the first week, 3% the second week, and 1% the third week.
Set $g(x)$ as the weekly incoming patients, in thousands.
The convolution $c(t) = f * g$, shows how many ventilators are needed each week (in thousands). $c(5)$ is how many ventilators are needed 5 weeks from now.

Let's try it out:

F = [.05, .03, .01] is the ventilator use percentage by week
G = [10, 20, 30, 20, 10, 10, 10], is the incoming hospitalized patients. It starts at 10k per week, rises to 30k, then decays to 10k.

With these numbers, we expect a max ventilator use of 2.2k in 2 weeks:

The convolution drops to 0 after 9 weeks because the patient list has run out. In this example, we're interested in the peak value the convolution hits, not the long-term total.

Other plans to convolve may be drug doses, vaccine appointments (one today, another a month from now), reinfections, and other complex interactions.

The hospital analogy is the mental model I wish I had when learning. Now that we've tried it with actual numbers, let's add the Math Juice and convert the analogy to calculus.

Part 2: The Calculus Definition

So, what happened in our example? We had a list of patients and a plan. If the plan were simple (single day [3]), regular multiplication would have worked. Because the plan was complex, we had to "convolve" it.

Time for some Fun Facts™:

Convolution is written $f * g$, with an asterisk. Yes, an asterisk usually indicates multiplciation, but in advanced calculus class, it indicates a convolution. Regular multiplication is just implied ($fg$).
The result of a convolution is a new function that gives the total usage for any day ("What was the total usage on day $t=3$?"). We can graph the convolution over time to see the day-by-day totals.

Now the big aha: Convolution reverses one of the lists! Here's why.

Let's call our treatment plan $f(x)$. In our example, we used [3 2 1].

The list of patients (inputs) is $g(x)$. However, we need to reverse this list as we slide it, so the earliest patient (Monday) enters the hospital first (first in, first out). This means we need to use $g(-x)$, the horizontal reflection of $g(x)$. [1 2 3 4 5] becomes [5 4 3 2 1].

Now that we have the reversed list, pick a day to compute ($t = 0, 1, 2...$). To slide our patient list by this much, we use: $g(-x + t)$. That is, we reverse the list ($-x$) and jump to the correct day ($+t$).

We have our scenario:

$f(x)$ is the plan to use
$g(-x + t)$ is the list of inputs (flipped and slid to the right day).

To get the total usage on day $t$, we multiply each patient with the plan, and sum the results (an integral). To account for any possible length, we go from -infinity to +infinity.

Now we can describe convolution formally using calculus:

(Like colorized math? There's more.)

Phew! That's quite few symbols. Some notes:

We use a dummy variable $\tau$ (tau) for the intermediate computation. Imagine $\tau$ as knocking on each room ($\tau={0, 1, 2, 3...}$), finding the dosage [$f(\tau)$], the number of patients [$g(t - \tau)$], multiplying them, and totaling things in the integral. Yowza. The so-called "dummy" variable $\tau$ is like i in a for loop: it's temporary, but does the work. (By analogy, $t$ is a global variable has a fixed value during the loop: it's the day we're calculating the usage for, such as t = Day 5).
In the official definition, you'll see $g(t - \tau)$ instead of $g(- \tau+ t)$. The second version shows the flip ($-\tau$) and slide ($+t$). Writing $g(t - \tau)$ makes it seem like we're interested in the difference between the variables, which confused me.
The treatment plan (program to run) is called the kernel: you convolve a kernel with an input.

Not too bad, right? The equation is a formal description of the analogy.

Part 3: Mathematical Properties of Convolution

We can't discover a new math operation without taking it for a spin. Let's see how it behaves.

Convolution is commutative: f * g = g * f

In our computation, we flipped the patient list and kept the plan the same. Could we have flipped the plan instead?

You bet. Imagine the patients are immobile, and stay in their rooms: [1 2 3 4 5]. To deliver the medicine, we have 3 medical carts that go to each room and deliver the dose. Each day, they slide forward one position.

  Carts ->
  1 2 3

      1 2 3 4 5
      Patients

As before, though our plan is written [3 2 1] (3 units on the first day), we flip the order of the carts to[1 2 3]. That way, a patient gets 3 units on their first day, as we expect. Checking with Wolfram Alpha, the calculation is the same.

ListConvolve[{1, 2, 3, 4, 5}, {3, 2, 1}, {1, -1}, 0]
{3, 8, 14, 20, 26, 14, 5}

Cool! It looks like convolution is commutative:

$\displaystyle{f * g = g * f}$

and we can decide to flip either $f$ or $g$ when calculating the integral. Surprising, right?

The integral of the convolution

When all treatments are finished, what was the total medicine usage? This is the integral of the convolution. (A few minutes ago, that phrase would have you jumping out of a window.)

But it's a simple calculation. Our plan gives each patient sum([3 2 1]) = 6 units of medicine. And we have sum([1 2 3 4 5]) = 15 patients. The total usage is just 6 x 15 = 90 units.

Wow, that was easy: the usage for the entire convolution is just the product of the subtotals!

$\displaystyle{\int (f * g) = \int f \cdot \int g }$

I hope this clicks intuitively. Note that this trick works for convolution, but not integrals in general. For example:

$\displaystyle{ \int (x \cdot x) \ne \int x \cdot \int x }$

If we separate $x \cdot x$ into two integrals we get:

$ \int (x \cdot x) = \int x^2 = \frac{1}{3} x^3 $
$\int x \cdot \int x = \frac{1}{2}x^2 \cdot \frac{1}{2}x^2 = \frac{1}{4}x^4$

and those aren't the same. (Calculus would be much easier if we could split integrals like this.) It's strange, but $\int (f * g)$ is probably easier to solve than $\int (fg)$.

Impulse Response

What happens if we sent a single patient through the hospital? The convolution would just be that day's plan.

Plan    * Patients = Convolution
[3 2 1] * [1]      = [3 2 1]

In other words, convolving with [1] gives us the original plan.

In calculus terms, a spike of [1] (and 0 otherwise) is the Dirac Delta Function. In terms of convolutions, this function acts like the number 1 and returns the original function:

$\displaystyle{f(t) * \delta(t) = f(t)}$

We can delay the delta function by T, which delays the resulting convolution function too. Imagine our single patient shows up a week late ($\delta(t - T)$), so our medicine usage gets delayed for a week too:

$\displaystyle{f(t) * \delta(t - T) = f(t - T)}$

Part 4: Convolution Theorem & The Fourier Transform

The Fourier Transform (written with a fancy $\mathscr{F}$) converts a function $f(t)$ into a list of cyclical ingredients $F(s)$:

$\displaystyle{f(t) \xrightarrow{\mathscr{F}} F(s)}$

As an operator, this can be written $\mathscr{F}\lbrace f \rbrace = F$.

In our analogy, we convolved the plan and patient list with a fancy multiplication. Since the Fourier Transform gives us lists of ingredients, could we get the same result by mixing the ingredient lists?

Yep, we can: Fancy multiplication in the regular world is regular multiplication in the fancy world.

In math terms, "Convolution in the time domain is multiplication in the frequency (Fourier) domain."

Mathematically, this is written:

$\displaystyle{f * g \xrightarrow{\mathscr{F}} FG}$

$\displaystyle{\mathscr{F}\lbrace f * g \rbrace = FG}$

where $f(x)$ and $g(x)$ are functions to convolve, with transforms $F(s)$ and $G(s)$.

We can prove this theorem with advanced calculus, that uses theorems I don't quite understand, but let's think through the meaning.

Because $F(s)$ is the Fourier Transform of $f(t)$, we can ask for a specific frequency ($s = 2\text{Hz}$) and get the combined interaction of every data point with that frequency. Let's suppose:

$\displaystyle{F(2) = 3 + i}$

That means after every data point has been multiplied against the 2Hz cycle, the result is $3 + i$. But we could have kept each interaction separate:

$\displaystyle{F(2) = 3 + i = c_0 + c_1 + c_2 + c_3 ... + c_t}$

Where $c_t$ is the contribution to the 2Hz frequency from datapoint $t$. Similarly, we can expand $G(s)$ into a list of interactions with the 2Hz ingredient. Let's suppose $G(2) = 7 - i$:

$\displaystyle{G(2) = 7 - i = d_0 + d_1 + d_2 + d_3 ... + d_t}$

The Convolution Theorem is really saying:

$\displaystyle{f * g \xrightarrow{\mathscr{F}} FG = (c_0 + c_1 + c_2 + ...)(d_0 + d_1 + d_2 + ...)}$

Our convolution in the regular domain involves a lot of cross-multiplications. In the fancy frequency domain, we still have a bunch of interactions, but $F(s)$ and $G(s)$ have consolidated them. We can just multiply $F(2)G(2) = (3 + i)(7-i)$ to find the 2Hz ingredient in the convolved result.

By analogy, suppose you want to calculate:

$\displaystyle{(1 + 2 + 3 + 4)(5 + 6 + 7 + 8) = ?}$

It's a lot of work to cross-multiply every term: $(1 \cdot 5) + (1\cdot 6) + (1\cdot 7) + ...$

It's better to consolidate the groups into $(1 + 2 + 3 + 4) = 10$ and $(5 + 6 + 7 + 8) = 26$, and then multiply to get $10 \cdot 26 = 260$.

This nuance caused me a lot of confusion. It seems like $FG$ is a single multiplication, while $f * g$ involves a bunch of intermediate terms. I forgot that $F$ already did the work of merging a bunch of entries into a single one.

Now, we aren't quite done.

$\displaystyle{f * g \xrightarrow{\mathscr{F}} FG \xrightarrow{\mathscr{F^{-1}}} \ ?}$

We can convert $f * g$ in the time domain into $FG$ in the frequency domain, but we probably need it back in the time domain for a usable result:

$\displaystyle{f * g = \mathscr{F}^{-1} \lbrace FG \rbrace}$

You have a riddle in English ($f * g$), you translate it to French ($FG$), get your smart French friend to work out that calculation, then convert it back to English ($\mathscr{F}^{-1}$).

And in reverse...

The convolution theorem works this way too:

$\displaystyle{fg \xrightarrow{\mathscr{F}} F * G}$

Regular multiplication in the regular world is fancy multiplication in the fancy world.

Cool, eh? Instead of multiplying two functions like some cave dweller, put on your monocle, convolve the Fourier Transforms, and and convert to the time domain:

$\displaystyle{fg = \mathscr{F}^{-1} \lbrace F*G \rbrace}$

I'm not saying this is fun, just that it's possible. If your French friend has a gnarly calculation they're struggling with, it might look like arithmetic to you.

Mini proof

Remember how we said the integral of a convolution was a multiplication of the individual integrals?

$\displaystyle{\int (f * g) = \int f \cdot \int g }$

Well, the Fourier Transform is just a very specific integral, right?

$\displaystyle{\int \rightarrow \mathscr{F}}$

So (handwaving), it seems we could swap the general-purpose integral $\int$ for $\mathscr{F}$ and get

$\displaystyle{\mathscr{F} (f * g) = \mathscr{F} (f) \cdot \mathscr{F} (g) }$

which is the convolution theorem. I need a deeper intuition for the proof, but this helps things click.

Part 5: Applications

The trick with convolution is finding a useful "program" (kernel) to apply to your input. Here's a few examples.

Moving averages

Let's say you want a moving average between neighboring items in a list. That is half of each element, added together:

$\displaystyle{\frac{a + b}{2} = \frac{a}{2} + \frac{b}{2}}$

This is a "multiplication program" of [0.5 0.5] convolved with our list:

ListConvolve[{1, 4, 9, 16, 25}, {0.5, 0.5}, {1, -1}, 0] 
{0.5, 2.5, 6.5, 12.5, 20.5, 12.5}

We can perform a moving average with a single operation. Neat!

A 3-element moving average would be [.33 .33 .33], a weighted average could be [.5 .25 .25].

Derivatives

The derivative finds the difference between neighboring values. Here's the plan: [1 -1]

ListConvolve[{1, 2, 3, 4, 5}, {1, -1}, {1, -1}, 0] 
{1, 1, 1, 1, 1, -5}                         // -5 since we ran out of entries

ListConvolve[{1, 4, 9, 16, 25}, {1, -1}, {1, -1}, 0] 
{1, 3, 5, 7, 9, -25}                        // discrete derivative is 2x + 1

With a simple kernel, we can find a useful math property on a discrete list. And to get a second derivative, just apply the derivative convolution twice:

F * [1 -1] * [1 -1]

As a shortcut, we can precompute the final convolutions ([1 -1] * [1 -1] ) and get:

ListConvolve[{1, -1}, {1,-1}, {1, -1}, 0] 
{1, -2, 1}

Now we have a single kernel [1, -2, 1] that gets the second derivative of a list:

ListConvolve[{1, 4, 9, 16, 25}, {1, -2, 1}, {1, -1}, 0] 
{1, 2, 2, 2, 2, -34, 25}

Excluding the boundary items, we get the expected second derivative:

$\displaystyle{x^2 \xrightarrow{d/dx} 2x \xrightarrow{d/dx} 2}$

Blurring / unblurring images

An image blur is essentially a convolution of your image with some "blurring kernel":

$\displaystyle{\text{blurred} = \text{image} * \text{blur}}$

The blur of our 2D image requires a 2D average:

Can we undo the blur? Yep! With our friend the Convolution Theorem, we can do:

$\displaystyle{\text{blurred} = \text{image} * \text{blur}}$

$\displaystyle{\mathscr{F} \lbrace \text{blurred} \rbrace = \mathscr{F} \lbrace \text{image} * \text{blur} \rbrace}$

$\displaystyle{\mathscr{F} \lbrace \text{blurred} \rbrace = \mathscr{F} \lbrace \text{image} \rbrace \mathscr{F} \lbrace \text{blur} \rbrace}$

$\displaystyle{\frac{ \mathscr{F} \lbrace \text{blurred} \rbrace }{\mathscr{F} \lbrace \text{blur} \rbrace} = \mathscr{F} \lbrace \text{image} \rbrace }$

$\displaystyle{\mathscr{F}^{-1} \lbrace \frac{ \mathscr{F} \lbrace \text{blurred} \rbrace }{\mathscr{F} \lbrace \text{blur} \rbrace} \rbrace = \text{image}}$

Whoa! We can recover the original image by dividing out the blur. Convolution is a simple multiplication in the frequency domain, and deconvolution is a simple division in the frequency domain.

A short while back, the concept of "deblurring by dividing Fourier Transforms" was gibberish to me. While it can be daunting mathematically, it's getting simpler conceptually.

Algorithm Trick: Multiplication

What is a number? A list of digits:

1234 = 1000 + 200 + 30 + 4 = [1000 200 30 4]
5678 = 5000 + 600 + 70 + 8 = [5000 600 70 8]

And what is regular, grade-school multiplication? A digit-by-digit convolution! We sweep one list of digits by the other, multiplying and adding as we go:

Source

We can perform the calculation by convolving the lists of digits (wolfram alpha):

ListConvolve[{1000, 200, 30, 4}, {8, 70, 600, 5000}, {1, -1}, 0]
{8000, 71600, 614240, 5122132, 1018280, 152400, 20000}

sum {8000, 71600, 614240, 5122132, 1018280, 152400, 20000}
7006652

Note that we pre-flip one of the lists (it gets swapped in the convolution later), and the intermediate calculations are a bit different. But, combining the subtotals gives the expected result.

Faster Convolutions

Why convolve instead of doing a regular digit-by-digit multiplication? Well, the convolution theorem lets us substitute convolution with Fourier Transforms:

$\displaystyle{f * g = \mathscr{F}^{-1} \lbrace FG \rbrace}$

The convolution ($f * g$) has complexity $O(n^2)$. We have $n$ positions to process, with $n$ intermediate multiplications at each position.

The right side involves:

Two Fourier Transforms, which are normally $O(n^2)$. However, the Fast Fourier Transform (a divide-and-conquer approach) makes them $O(n\log(n))$.
Pointwise multiplication of the final result of the transforms ($\sum a_n \cdot b_n$), which is $O(n)$
An inverse transform, which is $O(n\log(n))$

And the total complexity is: $O(n\log(n)) + O(n\log(n)) + O(n) + O(n\log(n)) = O(n\log(n))$

Regular multiplication in the fancy domain is faster than a fancy multiplication in the regular domain. Our French friend is no slouch. (More)

Convolutional Neural Nets (CNN)

Machine learning is about discovering the math functions that transform input data into a desired result (a prediction, classification, etc.).

Starting with an input signal, we could convolve it with a bunch of kernels:

$\displaystyle{\text{input} * k_1 * k_2 * k_3 ... = \text{result}}$

Given that convolution can do complex math (moving averages, blurs, derivatives...), it seems some combination of kernels should turn our input into something useful, right?

Convolutional Neural Nets (CNNs) process an input with layers of kernels, optimizing their weights (plans) to reach a goal. Imagine tweaking the treatment plan to keep medicine usage below some threshold.

CNNs are often used with image classifiers, but 1D data sets work just fine.

Nice writeup: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
Digit classifier demo: https://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html

LTI System Behavior

A linear, time-invariant system means:

Linear: Scaling and combining inputs scales and combines outputs by the same amount
Time invariant: Outputs depend on relative time, not absolute time. You get 3 units on your first day, and it doesn't matter if it's Wednesday or Thursday.

A fancy phrase is "A LTI system is characterized by its impulse response". Translation: If we send a single patient through the hospital [1], we'll discover the treatment plan. Then we can predict the usage for any sequence of patients by convolving it with the plan.

$\displaystyle{\text{system response} = \text{impulse response} * \text{inputs}}$

If the system isn't LTI, we can't extrapolate based on a single person's experience. Scaling the inputs may not scale the outputs, and the actual calendar day, not relative day, may impact the result (imagine fewer rooms available on weekends).

Engineering Analogies

From David Greenspan: "Suppose you have a special laser pointer that makes a star shape on the wall. You tape together a bunch of these laser pointers in the shape of a square. The pattern on the wall now is the convolution of a star with a square."

Regular multiplication gives you a single scaled copy of an input. Convolution creates multiple overlapping copies that follow a pattern you've specified.

Real-world systems have squishy, not instantaneous, behavior: they ramp up, peak, and drop down. The convolution lets us model systems that echo, reverb and overlap.

Now it's time for the famous sliding window example. Think of a pulse of inputs (red) sliding through a system (blue), and having a combined effect (yellow): the convolution.

(Source)

Summary

Convolution has an advanced technical definition, but the basics can be understood with the right analogy.

Quick rant: I study math for fun, yet it took years to find a satisfying intuition for:

Why is one function reversed?
Why is convolution commutative?
Why does the integral of the convolution = product of integrals?
Why are the Fourier Transforms multiplied point-by-point, and not overlapped?

Why'd it take so long? Imagine learning multiplication with $f \times g = z$ instead of $3 \times 5 = 15$. Without an example I can explore in my head, I could only memorize results, not intuit them. Hopefully this analogy can save you years of struggle.

Happy math.

Topic Reference

Convolution

Fourier Transform

Calculus

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Imaginary Multiplication vs. Imaginary Exponents

Imaginary numbers perform rotations. So what's the difference between $2 i$ and $2^i$?

Imaginary multiplication directly rotates our position
Imaginary exponents rotate the direction of our exponential growth; we compute our position after the sideways growth is complete

I think of imaginary multiplication as turning your map 90 degrees. East becomes North; no matter how long you drove East, now you're going North.

An imaginary exponent is like turning just the steering wheel. Where you will end up? Depends how long you drive!

That's the intuition, let's work through the details.

Multiplication World and Exponent World

The math is straightforward when multiplying by $i$. We perform a 90-degree rotation around the origin, so 1 becomes $1i$, 2 becomes $2i$, and so on:

This is Multiplication World, where numbers are plopped on the number line then slid around (added), stretched (multiplied), shrunk (divided), and so on. Rotating a number is new, but not overly strange.

Unfortunately, the Multiplication World perspective isn't great for exponents. If we see exponents as "repeated multiplication" we're stuck when we try to count $i$ times. It's the wrong model.

Nope -- to use $i$ in the exponent, we need to enter Exponent World.

Here, numbers are grown, not simply plopped down on the number line. Every number starts at 1.0, then we run an exponential growth engine at 100% for some period of time:

e^0 = 1.0 (no fuel, you stay the same)
e^1 = 2.71828... (1 unit of fuel, with continuously compounding interest)
e^2 = 7.3189... (2 units of fuel, with continuously compounding interest)

In Exponent World, a familiar number like "2" is just 1.0, grown for .693 seconds at a 100% continuous interest rate. In other words:

$\displaystyle{2 = e^{\ln(2)} = e^{.693 \cdot 100\%}}$

And in general:

$\displaystyle{x = e^{\ln(x) \cdot 100\%}}$

$e^x$ is an rocketship that pushes our numbers ever further from our starting point of 1. At t=3 we're around 20, and at t=10 we're over 20,000.

So, what happens if we drop an imaginary number into the exponent ($e^{1} \rightarrow e^{1i}$)? We keep the same amount of fuel, but rotate our engine sideways:

With regular exponential growth, we expect to speed along the real dimension. With our sideways engine, we'll need to compute what will happen.

Euler's Formula gives us the answer: constant force in a perpendicular direction creates an orbit:

and in general:

A few notes/gotchas:

We always start at 1.0. When seeing $e = 2.718...$, it's tempting to think growth starts from 2.718. But no -- when we write $e^x$ we're still begin the growth process at 1.0 ($e^0 = 1$ implies no change from 1.0).

I try to remember we can swap $e$ for its official definition at any time:

The only starting point we see in the definition is 1.

Every number orbits at a radius of 1.0. In exponent world, every number is grown from 1.0, just with varying amounts of fuel. When we put the engine sideways, the orbits are at 1.0, for varying distances around the circle.
The orbit doesn't get faster. Regular exponential growth has runaway compounding because our changes accumulate in the same direction. With sideways growth, changes don't accumulate (always in a new direction) and we spin at a constant speed. In other words, $e^{10}$ is thousands of times larger than $e^1$, but $e^{10i}$ is only ten times around the circle compared to $e^i$.

Example: Compute 2^i

So, a setup like $2^i$ tells us to use .693 units of fuel in a sideways direction:

$\displaystyle{2^i = e^{\ln(2)i}}$

To get the coordinates for our final position, we see how far ln(2) = .693 units of fuel takes us around the unit circle:

$\displaystyle{2^i = \cos(.693) + i \sin(.693) = .769 + .639i}$

Phew! Working out $2^i$ (rotated exponential growth) is much trickier than $2i$ (a simple rotation).

More trickiness: i^i

Now let's get tricky. What is $i^i$?

Remember, we're in Exponent World, and even $i$ is something we had to grow to! In other words, we start at 1.0 and orbit a quarter of the way around the circle (90 degrees, or $\frac{\pi}{2}$ radians).

$\displaystyle{i = e^{\frac{\pi}{2} i}}$

Whoa. Don't like how $i$ appears in its own exponential definition? It must also bother you that every word in the dictionary is defined by other words.

Coming back to $i^i$, we have two operations.

The bottom $i$ (in the base) is shorthand for running our engine sideways, with $\frac{\pi}{2}$ units of fuel
The top $i$ (in the exponent) says "Nah, spin that engine one more time"
The engine is facing 180 degrees (backwards on the real axis) with $\frac{\pi}{2}$ units of fuel

Ah. The result is

$\displaystyle{i^i = [e^{\frac{\pi}{2} i}]^i = e^{\frac{\pi}{2} i \cdot i} = e^{-\frac{\pi}{2}} = .207...}$

And we end up with a real number. Intuitively, we can roughly predict this because we start at 1.0 and point the engine backwards on the real axis. Once you can mentally estimate the direction $i^i$ goes, in seconds, you've got a sold intuition on imaginary exponents.

Happy math.

Appendix: i * i^i

Just for fun, what about $i \cdot i^i$?

We know that $i^i$ is a purely real number smaller than 1.0. The first $i$ (doing the multiplication) will just rotate us, so now we have a purely imaginary number smaller than $1i$.

Not bad! Work through the exponents, then rotate the final position.

Appendix: Physics Interpretation

From a physics perspective, if $f(x) = e^{ix}$ is our position, then $f'(x)$ is our velocity. Working this out we get:

$\displaystyle{f'(x) = \frac{d}{dx} e^{ix} = i e^{ix} = i \cdot f(x)}$

In other words, our velocity is perpendicular to our position. Taking the derivative of $e^{ix}$ might seem weird, but treat $i$ like any other constant: $\frac{d}{dx} e^{ax} = ae^{ax}$.

Math

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

A Programmer’s Intuition for Matrix Multiplication

What does matrix multiplication mean? Here's a few common intuitions:

1) Matrix multiplication scales/rotates/skews a geometric plane.

This is useful when first learning about vectors: vectors go in, new ones come out. Unfortunately, this can lead to an over-reliance on geometric visualization.

If 20 families are coming to your BBQ, how do you estimate the hotdogs you need? (Hrm… 20 families, call it 3 people per family, 2 hotdogs each… about 20 * 3 * 2 = 120 hotdogs.)

You probably don't think "Oh, I need the volume of a invitation-familysize-hunger prism!". With large matrices I don't think about 500-dimensional vectors, just data to be modified.

2) Matrix multiplication composes linear operations.

This is the technically accurate definition: yes, matrix multiplication results in a new matrix that composes the original functions. However, sometimes the matrix being operated on is not a linear operation, but a set of vectors or data points. We need another intuition for what's happening.

I'll put a programmer's viewpoint into the ring:

3) Matrix multiplication is about information flow, converting data to code and back.

I think of linear algebra as "math spreadsheets" (if you're new to linear algebra, read this intro):

We store information in various spreadsheets ("matrices")
Some of the data are seen as functions to apply, others as data points to use
We can swap between the vector and function interpretation as needed

Sometimes I'll think of data as geometric vectors, and sometimes I'll see a matrix as a composing functions. But mostly I think about information flowing through a system. (Some purists cringe at reducing beautiful algebraic structures into frumpy spreadsheets; I sleep OK at night.)

Programmer's Intuition: Code is Data is Code

Take your favorite recipe. If you interpret the words as instructions, you'll end up with a pie, muffin, cake, etc.

If you interpret the words as data, the text is prose that can be tweaked:

Convert measurements to metric units
Swap ingredients due to allergies
Adjust for altitude or different equipment

The result is a new recipe, which can be further tweaked, or executed as instructions to make a different pie, muffin, cake, etc. (Compilers treat a program as text, modify it, and eventually output "instructions" — which could be text for another layer.)

That's Linear Algebra. We take raw information like "3 4 5" treat it as a vector or function, depending on how it's written:

By convention, a vertical column is usually a vector, and a horizontal row is typically a function:

[3; 4; 5] means x = (3, 4, 5). Here, x is a vector of data (I'm using ; to separate each row).
[3 4 5] means f(a, b, c) = 3a + 4b + 5c. This is a function taking three inputs and returning a single result.

And the aha! moment: data is code, code is data!

The row containing a horizontal function could really be three data points (each with a single element). The vertical column of data could really be three distinct functions, each taking a single parameter.

Ah. This is getting neat: depending on the desired outcome, we can combine data and code in a different order.

The Matrix Transpose

The matrix transpose swaps rows and columns. Here's what it means in practice.

If x was a column vector with 3 entries ([3; 4; 5]), then x' is:

A function taking 3 arguments ([3 4 5])
x' can still remain a data vector, but as three separate entries. The transpose "split it up".

Similarly, if f = [3 4 5] is our row vector, then f' can mean:

A single data vector, in a vertical column.
f' is separated into three functions (each taking a single input).

Let's use this in practice.

When we see x' * x we mean: x' (as a single function) is working on x (a single vector). The result is the dot product (read more). In other words, we've applied the data to itself.

When we see x * x' we mean x (as a set of functions) is working on x' (a set of individual data points). The result is a grid where we've applied each function to each data point. Here, we've mixed the data with itself in every possible permutation.

I think of xx as x(x). It's the "function x" working on the "vector x". (This helps compute the covariance matrix, a measure of self-similarity in the data.)

Putting The Intuition To Use

Phew! How does this help us? When we see an equation like this (from the Machine Learning class):

$\displaystyle{\[h_{\theta}(x)=\theta^Tx\] }$

I now have an instant feel of what's happening. In the first equation, we're treating $\theta$ (which is normally a set of data parameters) as a function, and passing in $x$ as an argument. This should give us a single value.

More complex derivations like this:

$\displaystyle{\[\theta=(X^TX)^{-1}X^Ty\]}$

can be worked through. In some cases it gets tricky because we store the data as rows (not columns) in the matrix, but now I have much better tools to follow along. You can start estimating when you'll get a single value, or when you'll get a "permutation grid" as a result.

Geometric scaling and linear composition have their place, but here I want to think about information. "The information in x is becoming a function, and we're passing itself as the parameter."

Long story short, don't get locked into a single intuition. Multiplication evolved from repeated addition, to scaling (decimals), to rotations (imaginary numbers), to "applying" one number to another (integrals), and so on. Why not the same for matrix multiplication?

Happy math.

Appendix: What about the other combinations?

You may be curious why we can't use the other combinations, like x x or x' x'. Simply put, the parameters don't line up: we'd have functions expecting 3 inputs only being passed a single parameter, or functions expecting single inputs getting passed 3.

Appendix: Javascript Interpretation

The dot product x' * x could be seen as the following javascript command:

((x, y, z) => x*3 + y*4 + z*5)(3, 4, 5)

We define an anonymous function of 3 arguments, and immediately pass it 3 parameters. This returns 50 (the dot product: 3*3 + 4*4 + 5*5 = 50).

The math notation is super-compact, so we can simply write (in Octave/Matlab):

octave:2> [3 4 5] * [3 4 5]' ans = 50

Remember that [3 4 5] is the function and [3; 4; 5] or [3 4 5]' is how we'd write the data vector.

Appendix: ADEPT Method

This article came about from a TODO in my machine learning class notes that use the ADEPT Method:

I wanted to explain to myself — in plain English — why we wanted x' x and not the reverse. Now, in plain English: We're treating the information as a function, and passing the same info as the parameter.

Math

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

An Interactive Guide To The Fourier Transform

The Fourier Transform is one of deepest insights ever made. Unfortunately, the meaning is buried within dense equations:

$\displaystyle{X_k = \sum_{n=0}^{N-1} x_n \cdot e^{-i 2 \pi k n / N}}$

$\displaystyle{x_n = \frac{1}{N} \sum_{k=0}^{N-1} X_k \cdot e^{i 2 \pi k n / N}}$

Yikes. Rather than jumping into the symbols, let's experience the key idea firsthand. Here's a plain-English metaphor:

What does the Fourier Transform do? Given a smoothie, it finds the recipe.
How? Run the smoothie through filters to extract each ingredient.
Why? Recipes are easier to analyze, compare, and modify than the smoothie itself.
How do we get the smoothie back? Blend the ingredients.

Here's the "math English" version of the above:

The Fourier Transform takes a time-based pattern, measures every possible cycle, and returns the overall "cycle recipe" (the amplitude, offset, & rotation speed for every cycle that was found).

Time for the equations? No! Let's get our hands dirty and experience how any pattern can be built with cycles, with live simulations.

If all goes well, we'll have an aha! moment and intuitively realize why the Fourier Transform is possible. We'll save the detailed math analysis for the follow-up.

This isn't a force-march through the equations, it's the casual stroll I wish I had. Onward!

From Smoothie to Recipe

A math transformation is a change of perspective. We change our notion of quantity from "single items" (lines in the sand, tally system) to "groups of 10" (decimal) depending on what we're counting. Scoring a game? Tally it up. Multiplying? Decimals, please.

The Fourier Transform changes our perspective from consumer to producer, turning What do I have? into How was it made?

In other words: given a smoothie, let's find the recipe.

Why? Well, recipes are great descriptions of drinks. You wouldn't share a drop-by-drop analysis, you'd say "I had an orange/banana smoothie". A recipe is more easily categorized, compared, and modified than the object itself.

So... given a smoothie, how do we find the recipe?

Well, imagine you had a few filters lying around:

Pour through the "banana" filter. 1 oz of bananas are extracted.
Pour through the "orange" filter. 2 oz of oranges.
Pour through the "milk" filter. 3 oz of milk.
Pour through the "water" filter. 3 oz of water.

We can reverse-engineer the recipe by filtering each ingredient. The catch?

Filters must be independent. The banana filter needs to capture bananas, and nothing else. Adding more oranges should never affect the banana reading.
Filters must be complete. We won't get the real recipe if we leave out a filter ("There were mangoes too!"). Our collection of filters must catch every possible ingredient.
Ingredients must be combine-able. Smoothies can be separated and re-combined without issue (A cookie? Not so much. Who wants crumbs?). The ingredients, when separated and combined in any order, must make the same result.

See The World As Cycles

The Fourier Transform takes a specific viewpoint: What if any signal could be filtered into a bunch of circular paths?

Whoa. This concept is mind-blowing, and poor Joseph Fourier had his idea rejected at first. (Really Joe, even a staircase pattern can be made from circles?)

And despite decades of debate in the math community, we expect students to internalize the idea without issue. Ugh. Let's walk through the intuition.

The Fourier Transform finds the recipe for a signal, like our smoothie process:

Start with a time-based signal
Apply filters to measure each possible "circular ingredient"
Collect the full recipe, listing the amount of each "circular ingredient"

Stop. Here's where most tutorials excitedly throw engineering applications at your face. Don't get scared; think of the examples as "Wow, we're finally seeing the source code (DNA) behind previously confusing ideas".

If earthquake vibrations can be separated into "ingredients" (vibrations of different speeds & amplitudes), buildings can be designed to avoid interacting with the strongest ones.
If sound waves can be separated into ingredients (bass and treble frequencies), we can boost the parts we care about, and hide the ones we don't. The crackle of random noise can be removed. Maybe similar "sound recipes" can be compared (music recognition services compare recipes, not the raw audio clips).
If computer data can be represented with oscillating patterns, perhaps the least-important ones can be ignored. This "lossy compression" can drastically shrink file sizes (and why JPEG and MP3 files are much smaller than raw .bmp or .wav files).
If a radio wave is our signal, we can use filters to listen to a particular channel. In the smoothie world, imagine each person paid attention to a different ingredient: Adam looks for apples, Bob looks for bananas, and Charlie gets cauliflower (sorry bud).

The Fourier Transform is useful in engineering, sure, but it's a metaphor about finding the root causes behind an observed effect.

Think With Circles, Not Just Sinusoids

One of my giant confusions was separating the definitions of "sinusoid" and "circle".

A "sinusoid" is a specific back-and-forth pattern (a sine or cosine wave), and 99% of the time, it refers to motion in one dimension.
A "circle" is a round, 2d pattern you probably know. If you enjoy using 10-dollar words to describe 10-cent ideas, you might call a circular path a "complex sinusoid".

Labeling a circular path as a "complex sinusoid" is like describing a word as a "multi-letter". You zoomed into the wrong level of detail. Words are about concepts, not the letters they can be split into!

The Fourier Transform is about circular paths (not 1-d sinusoids) and Euler's formula is a clever way to generate one:

Must we use imaginary exponents to move in a circle? Nope. But it's convenient and compact. And sure, we can describe our path as coordinated motion in two dimensions (real and imaginary), but don't forget the big picture: we're just moving in a circle.

Following Circular Paths

Let's say we're chatting on the phone and, like usual, I want us to draw the same circle simultaneously. (You promised!) What should I say?

How big is the circle? (Amplitude, i.e. size of radius)
How fast do we draw it? (Frequency. 1 circle/second is a frequency of 1 Hertz (Hz) or 2*pi radians/sec)
Where do we start? (Phase angle, where 0 degrees is the x-axis)

I could say "2-inch radius, start at 45 degrees, 1 circle per second, go!". After half a second, we should each be pointing to: starting point + amount traveled = 45 + 180 = 225 degrees (on a 2-inch circle).

Every circular path needs a size, speed, and starting angle (amplitude/frequency/phase). We can even combine paths: imagine tiny motorcars, driving in circles at different speeds.

The combined position of all the cycles is our signal, just like the combined flavor of all the ingredients is our smoothie.

Here's a simulation of a basic circular path:

(Based on this animation, here's the source code. Modern browser required. Click the graph to pause/unpause.)

The magnitude of each cycle is listed in order, starting at 0Hz. Cycles [0 1] means

0 amplitude for the 0Hz cycle (0Hz = a constant cycle, stuck on the x-axis at zero degrees)
1 amplitude for the 1Hz cycle (completes 1 cycle per time interval)

Now the tricky part:

The blue graph measures the real part of the cycle. Another lovely math confusion: the real axis of the circle, which is usually horizontal, has its magnitude shown on the vertical axis. You can mentally rotate the circle 90 degrees if you like.
The time points are spaced at the fastest frequency. A 1Hz signal needs 2 time points for a start and stop (a single data point doesn't have a frequency). The time values [1 -1] shows the amplitude at these equally-spaced intervals.

With me? [0 1] is a pure 1Hz cycle.

Now let's add a 2Hz cycle to the mix. [0 1 1] means "Nothing at 0Hz, 1Hz of amplitude 1, 2Hz of amplitude 1":

Whoa. The little motorcars are getting wild: the green lines are the 1Hz and 2Hz cycles, and the blue line is the combined result. Try toggling the green checkbox to see the final result clearly. The combined "flavor" is a sway that starts at the max and dips low for the rest of the interval.

The yellow dots are when we actually measure the signal. With 3 cycles defined (0Hz, 1Hz, 2Hz), each dot is 1/3 of the way through the signal. In this case, cycles [0 1 1] generate the time values [2 -1 -1], which starts at the max (2) and dips low (-1).

Oh! We can't forget phase, the starting angle! Use magnitude:angle to set the phase. So [0 1:45] is a 1Hz cycle that starts at 45 degrees:

This is a shifted version of [0 1]. On the time side we get [.7 -.7] instead of [1 -1], because our cycle isn't exactly lined up with our measuring intervals, which are still at the halfway point (this could be desired!).

The Fourier Transform finds the set of cycle speeds, amplitudes and phases to match any time signal.

Our signal becomes an abstract notion that we consider as "observations in the time domain" or "ingredients in the frequency domain".

Enough talk: try it out! In the simulator, type any time or cycle pattern you'd like to see. If it's time points, you'll get a collection of cycles (that combine into a "wave") that matches your desired points.

But… doesn't the combined wave have strange values between the yellow time intervals? Sure. But who's to say whether a signal travels in straight lines, or curves, or zips into other dimensions when we aren't measuring it? It behaves exactly as we need at the equally-spaced moments we asked for.

Making A Spike In Time

Can we make a spike in time, like (4 0 0 0), using cycles? I'll use parentheses () for a sequence of time points, and brackets [] for a sequence of cycles.

Although the spike seems boring to us time-dwellers (one data point, that's it?), think about the complexity in the cycle world. Our cycle ingredients must start aligned (at the max value, 4) and then "explode outwards", each cycle with partners that cancel it in the future. Every remaining point is zero, which is a tricky balance with multiple cycles running around (we can't just "turn them off").

Let's walk through each time point:

At time 0, the first instant, every cycle ingredient is at its max. Ignoring the other time points, (4 ? ? ?) can be made from 4 cycles (0Hz 1Hz 2Hz 3Hz), each with a magnitude of 1 and phase of 0 (i.e., 1 + 1 + 1 + 1 = 4).
At every future point (t = 1, 2, 3), the sum of all cycles must cancel.

Here's the trick: when two cycles are on opposites sides of the circle (North & South, East & West, etc.) their combined position is zero (3 cycles can cancel if they're spread evenly at 0, 120, and 240 degrees).

Imagine a constellation of points moving around the circle. Here's the position of each cycle at every instant:

Time 0 1 2 3 
------------
0Hz: 0 0 0 0 
1Hz: 0 1 2 3
2Hz: 0 2 0 2
3Hz: 0 3 2 1

Notice how the the 3Hz cycle starts at 0, gets to position 3, then position "6" (with only 4 positions, 6 modulo 4 = 2), then position "9" (9 modulo 4 = 1).

When our cycle is 4 units long, cycle speeds a half-cycle apart (2 units) will either be lined up (difference of 0, 4, 8…) or on opposite sides (difference of 2, 6, 10…).

OK. Let's drill into each time point:

Time 0: All cycles at their max (total of 4)
Time 1: 1Hz and 3Hz cancel (positions 1 & 3 are opposites), 0Hz and 2Hz cancel as well. The net is 0.
Time 2: 0Hz and 2Hz line up at position 0, while 1Hz and 3Hz line up at position 2 (the opposite side). The total is still 0.
Time 3: 0Hz and 2Hz cancel. 1Hz and 3Hz cancel.
Time 4 (repeat of t=0): All cycles line up.

The trick is having individual speeds cancel (0Hz vs 2Hz, 1Hz vs 3Hz), or having the lined-up pairs cancel (0Hz + 2Hz vs 1Hz + 3Hz).

When every cycle has equal power and 0 phase, we start aligned and cancel afterwards. (I don't have a nice proof yet -- any takers? -- but you can see it yourself. Try [1 1], [1 1 1], [1 1 1 1] and notice the signals we generate: (2 0), (3 0 0), (4 0 0 0)).

In my head, I label these signals as "time spikes": they have a value for a single instant, and are zero otherwise (the fancy name is a delta function.)

Here's how I visualize the initial alignment, followed by a net cancellation:

Moving The Time Spike

Not everything happens at t=0. Can we change our spike to (0 4 0 0)?

It seems the cycle ingredients should be similar to (4 0 0 0), but the cycles must align at t=1 (one second in the future). Here's where phase comes in.

Imagine a race with 4 runners. Normal races have everyone lined up at the starting line, the (4 0 0 0) time pattern. Boring.

What if we want everyone to finish at the same time? Easy. Just move people forward or backwards by the appropriate distance. Maybe granny can start 2 feet in front of the finish line, Usain Bolt can start 100m back, and they can cross the tape holding hands.

Phase shifts, the starting angle, are delays in the cycle universe. Here's how we adjust the starting position to delay every cycle 1 second:

A 0Hz cycle doesn't move, so it's already aligned
A 1Hz cycle goes 1 revolution in the entire 4 seconds, so a 1-second delay is a quarter-turn. Phase shift it 90 degrees backwards (-90) and it gets to phase=0, the max value, at t=1.
A 2Hz cycle is twice as fast, so give it twice the angle to cover (-180 or 180 phase shift -- it's across the circle, either way).
A 3Hz cycle is 3x as fast, so give it 3x the distance to move (-270 or +90 phase shift)

If time points (4 0 0 0) are made from cycles [1 1 1 1], then time points (0 4 0 0) are made from [1 1:-90 1:180 1:90]. (Note: I'm using "1Hz", but I mean "1 cycle over the entire time period").

Whoa -- we're working out the cycles in our head!

The interference visualization is similar, except the alignment is at t=1.

Test your intuition: Can you make (0 0 4 0), i.e. a 2-second delay? 0Hz has no phase. 1Hz has 180 degrees, 2Hz has 360 (aka 0), and 3Hz has 540 (aka 180), so it's [1 1:180 1 1:180].

Discovering The Full Transform

The big insight: our signal is just a bunch of time spikes! If we merge the recipes for each time spike, we should get the recipe for the full signal.

The Fourier Transform builds the recipe frequency-by-frequency:

Separate the full signal (a b c d) into "time spikes": (a 0 0 0) (0 b 0 0) (0 0 c 0) (0 0 0 d)
For any frequency (like 2Hz), the tentative recipe is "a/4 + b/4 + c/4 + d/4" (the amplitude of each spike is split among all frequencies)
Wait! We need to offset each spike with a phase delay (the angle for a "1 second delay" depends on the frequency).
Actual recipe for a frequency = a/4 (no offset) + b/4 (1 second offset) + c/4 (2 second offset) + d/4 (3 second offset).

We can then loop through every frequency to get the full transform.

Here's the conversion from "math English" to full math:

A few notes:

N = number of time samples we have
n = current sample we're considering (0 .. N-1)
x_n = value of the signal at time n
k = current frequency we're considering (0 Hertz up to N-1 Hertz)
X_k = amount of frequency k in the signal (amplitude and phase, a complex number)
The 1/N factor is usually moved to the reverse transform (going from frequencies back to time). This is allowed, though I prefer 1/N in the forward transform since it gives the actual sizes for the time spikes. You can get wild and even use $1/\sqrt{N}$ on both transforms (going forward and back creates the 1/N factor).
n/N is the percent of the time we've gone through. 2 * pi * k is our speed in radians / sec. e^-ix is our backwards-moving circular path. The combination is how far we've moved, for this speed and time.
The raw equations for the Fourier Transform just say "add the complex numbers". Many programming languages cannot handle complex numbers directly, so you convert everything to rectangular coordinates and add those.

Onward

This was my most challenging article yet. The Fourier Transform has several flavors (discrete/continuous/finite/infinite), covers deep math (Dirac delta functions), and it's easy to get lost in details. I was constantly bumping into the edge of my knowledge.

But there's always simple analogies out there -- I refuse to think otherwise. Whether it's a smoothie or Usain Bolt & Granny crossing the finish line, take a simple understanding and refine it. The analogy is flawed, and that's ok: it's a raft to use, and leave behind once we cross the river.

I realized how feeble my own understanding was when I couldn't work out the transform of (1 0 0 0) in my head. For me, it was like saying I knew addition but, gee whiz, I'm not sure what "1 + 1 + 1 + 1" would be. Why not? Shouldn't we have an intuition for the simplest of operations?

That discomfort led me around the web to build my intuition. In addition to the references in the article, I'd like to thank:

Scott Young, for the initial impetus for this post
Shaheen Gandhi, Roger Cheng, and Brit Cruise for kicking around ideas & refining the analogy
Steve Lehar for great examples of the Fourier Transform on images
Charan Langton for her detailed walkthrough
Julius Smith for a fantastic walkthrough of the Discrete Fourier Transform (what we covered today)
Bret Victor for his techniques on visualizing learning

Today's goal was to experience the Fourier Transform. We'll save the advanced analysis for next time.

Happy math.

Appendix: Projecting Onto Cycles

Stuart Riffle has a great interpretation of the Fourier Transform:

Imagine spinning your signal in a centrifuge and checking for a bias. I have a correction: we must spin backwards (the exponent in the equation above should be $e^{-i 2 \pi...}$). You already know why: we need a phase delay so spikes appear in the future.

Appendix: Another Awesome Visualization

Lucas Vieira, author of excellent Wikipedia animations, was inspired to make this interactive animation:

Fourier Toy - Click to download, requires flash

(Detailed list of control options)

The Fourier Transform is about cycles added to cycles added to cycles. Try making a "time spike" by setting a amplitude of 1 for every component (press Enter after inputting each number). Fun fact: with enough terms, you can draw any shape, even Homer Simpson.

Check out http://www.jezzamon.com/fourier/ for a great tool to draw any shape using epicycles.

Appendix: Article with R code samples

João Neto made a great writeup, with technical (R) code samples here:

http://www.di.fc.ul.pt/~jpn/r/fourier/fourier.html

Appendix: Using the code

All the code and examples are open source (MIT licensed, do what you like).

Interactive example (view source)
Github gist
Reddit discussion on details of the computation, I'm pb_zeppelin

Topic Reference

Fourier Transform

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

An Intuitive Guide to Linear Algebra

Despite two linear algebra classes, my knowledge consisted of “Matrices, determinants, eigen something something”.

Why? Well, let’s try this course format:

Name the course Linear Algebra but focus on things called matrices and vectors
Teach concepts like Row/Column order with mnemonics instead of explaining the reasoning
Favor abstract examples (2d vectors! 3d vectors!) and avoid real-world topics until the final week

The survivors are physicists, graphics programmers and other masochists. We missed the key insight:

Linear algebra gives you mini-spreadsheets for your math equations.

We can take a table of data (a matrix) and create updated tables from the original. It’s the power of a spreadsheet written as an equation.

Here’s the linear algebra introduction I wish I had, with a real-world stock market example.

What’s in a name?

“Algebra” means, roughly, “relationships”. Grade-school algebra explores the relationship between unknown numbers. Without knowing x and y, we can still work out that $(x + y)^2 = x^2 + 2xy + y^2$.

“Linear Algebra” means, roughly, “line-like relationships”. Let’s clarify a bit.

Straight lines are predictable. Imagine a rooftop: move forward 3 horizontal feet (relative to the ground) and you might rise 1 foot in elevation (The slope! Rise/run = 1/3). Move forward 6 feet, and you’d expect a rise of 2 feet. Contrast this with climbing a dome: each horizontal foot forward raises you a different amount.

Lines are nice and predictable:

If 3 feet forward has a 1-foot rise, then going 10x as far should give a 10x rise (30 feet forward is a 10-foot rise)
If 3 feet forward has a 1-foot rise, and 6 feet has a 2-foot rise, then (3 + 6) feet should have a (1 + 2) foot rise

In math terms, an operation F is linear if scaling inputs scales the output, and adding inputs adds the outputs:

$\begin{aligned} F(ax) &= a \cdot F(x) \\ F(x + y) &= F(x) + F(y) \end{aligned}$

In our example, $F(x)$ calculates the rise when moving forward x feet, and the properties hold:

$\displaystyle{F(10 \cdot 3) = 10 \cdot F(3) = 10}$

$\displaystyle{F(3+6) = F(3) + F(6) = 3}$

Linear Operations

An operation is a calculation based on some inputs. Which operations are linear and predictable? Multiplication, it seems.

Exponents ($F(x) = x^2$) aren’t predictable: $10^2$ is 100, but $20^2$ is 400. We doubled the input but quadrupled the output.

Surprisingly, regular addition isn’t linear either. Consider the “add three” function $F(x) = x + 3$:

$\begin{aligned} F(10) &= 13 \\ F(20) &= 23 \end{aligned}$

We doubled the input and did not double the output. (Yes, $F(x) = x + 3$ happens to be the equation for an offset line, but it’s still not “linear” because $F(10) \neq 10 \cdot F(1)$. Fun.)

So, what types of functions are actually linear? Plain-old scaling by a constant, or functions that look like: $F(x) = ax$. In our roof example, $a = 1/3$.

But life isn’t too boring. We can still combine multiple linear functions ($A(x) = ax, B(x) = bx, C(x)=cx$) into a larger one, $G$:

$\displaystyle{G(x,y,z) = A(x) + B(y) + C(z) = ax + by + cz }$

$G$ is still linear, since doubling the input continues to double the output:

$\displaystyle{G(2x, 2y, 2z) = a(2x) + b(2y) + c(2z) = 2(ax + by + cz) = 2 \cdot G(x, y, z)}$

We have “mini arithmetic”: multiply inputs by a constant, and add the results. It’s actually useful because we can split inputs apart, analyze them individually, and combine the results:

$\displaystyle{G(x,y,z) = G(x,0,0) + G(0,y,0) + G(0,0,z)}$

If we allowed non-linear operations (like $x^2$) we couldn’t split our work and combine the results, since $(a+b)^2 \neq a^2 + b^2$. Limiting ourselves to linear operations has its advantages.

Organizing Inputs and Operations

Most courses hit you in the face with the details of a matrix. “Ok kids, let’s learn to speak. Select a subject, verb and object. Next, conjugate the verb. Then, add the prepositions…”

No! Grammar is not the focus. What’s the key idea?

We have a bunch of inputs to track
We have predictable, linear operations to perform (our “mini-arithmetic”)
We generate a result, perhaps transforming it again

Ok. First, how should we track a bunch of inputs? How about a list:

x
y
z

Not bad. We could write it (x, y, z) too — hang onto that thought.

Next, how should we track our operations? Remember, we only have “mini arithmetic”: multiplications by a constant, with a final addition. If our operation $F$ behaves like this:

$\displaystyle{F(x, y, z) = 3x + 4y + 5z}$

We could abbreviate the entire function as (3, 4, 5). We know to multiply the first input by the first value, the second input by the second value, the third input by the third value, and add the results.

Only need the first input?

$\displaystyle{G(x, y, z) = 3x + 0y + 0z = (3, 0, 0)}$

Let’s spice it up: how should we handle multiple sets of inputs? Let’s say we want to run operation F on both (a, b, c) and (x, y, z). We could try this:

$\displaystyle{F(a, b, c, x, y, z) = ?}$

But it won’t work: F expects 3 inputs, not 6. We should separate the inputs into groups:

1st Input  2nd Input
--------------------
a          x
b          y
c          z

Much neater.

And how could we run the same input through several operations? Have a row for each operation:

F: 3 4 5
G: 3 0 0

Neat. We’re getting organized: inputs in vertical columns, operations in horizontal rows.

Visualizing The Matrix

Words aren’t enough. Here’s how I visualize inputs, operations, and outputs:

Imagine “pouring” each input through each operation:

As an input passes an operation, it creates an output item. In our example, the input (a, b, c) goes against operation F and outputs 3a + 4b + 5c. It goes against operation G and outputs 3a + 0 + 0.

Time for the red pill. A matrix is a shorthand for our diagrams:

$\text{I}\text{nputs} = A = \begin{bmatrix} \text{i}\text{nput1}&\text{i}\text{nput2}\end{bmatrix} = \begin{bmatrix}a & x\\b & y\\c & z\end{bmatrix}$

$\text{Operations} = M = \begin{bmatrix}\text{operation1}\\ \text{operation2}\end{bmatrix} = \begin{bmatrix}3 & 4 & 5\\3 & 0 & 0\end{bmatrix}$

A matrix is a single variable representing a spreadsheet of inputs or operations.

Trickiness #1: The reading order

Instead of an input => matrix => output flow, we use function notation, like y = f(x) or f(x) = y. We usually write a matrix with a capital letter (F), and a single input column with lowercase (x). Because we have several inputs (A) and outputs (B), they’re considered matrices too:

$\displaystyle{MA = B}$

$\begin{bmatrix}3 & 4 & 5\\3 & 0 & 0\end{bmatrix} \begin{bmatrix}a & x\\b & y\\c & z\end{bmatrix} = \begin{bmatrix}3a + 4b + 5c & 3x + 4y + 5z\\ 3a & 3x\end{bmatrix}$

Trickiness #2: The numbering

Matrix size is measured as RxC: row count, then column count, and abbreviated “m x n” (I hear ya, “r x c” would be easier to remember). Items in the matrix are referenced the same way: a_ij is the ith row and jth column (I hear ya, “i” and “j” are easily confused on a chalkboard). Mnemonics are ok with context, and here’s what I use:

RC, like Roman Centurion or RC Cola
Use an “L” shape. Count down the L, then across

Why does RC ordering make sense? Our operations matrix is 2×3 and our input matrix is 3×2. Writing them together:

[Operation Matrix] [Input Matrix]
[operation count x operation size] [input size x input count]
[m x n] [p x q] = [m x q]
[2 x 3] [3 x 2] = [2 x 2]

Notice the matrices touch at the “size of operation” and “size of input” (n = p). They should match! If our inputs have 3 components, our operations should expect 3 items. In fact, we can only multiply matrices when n = p.

The output matrix has m operation rows for each input, and q inputs, giving a “m x q” matrix.

Fancier Operations

Let’s get comfortable with operations. Assuming 3 inputs, we can whip up a few 1-operation matrices:

Adder: [1 1 1]
Averager: [1/3 1/3 1/3]

The “Adder” is just a + b + c. The “Averager” is similar: (a + b + c)/3 = a/3 + b/3 + c/3.

Try these 1-liners:

First-input only: [1 0 0]
Second-input only: [0 1 0]
Third-input only: [0 0 1]

And if we merge them into a single matrix:

[1 0 0]
[0 1 0]
[0 0 1]

Whoa — it’s the “identity matrix”, which copies 3 inputs to 3 outputs, unchanged. How about this guy?

[1 0 0]
[0 0 1]
[0 1 0]

He reorders the inputs: (x, y, z) becomes (x, z, y).

And this one?

[2 0 0]
[0 2 0]
[0 0 2]

He’s an input doubler. We could rewrite him to 2*I (the identity matrix) if we were so inclined.

And yes, when we decide to treat inputs as vector coordinates, the operations matrix will transform our vectors. Here’s a few examples:

Scale: make all inputs bigger/smaller
Skew: make certain inputs bigger/smaller
Flip: make inputs negative
Rotate: make new coordinates based on old ones (East becomes North, North becomes West, etc.)

These are geometric interpretations of multiplication, and how to warp a vector space. Just remember that vectors are examples of data to modify.

A Non-Vector Example: Stock Market Portfolios

Let’s practice linear algebra in the real world:

Input data: stock portfolios with dollars in Apple, Google and Microsoft stock
Operations: the changes in company values after a news event
Output: updated portfolios

And a bonus output: let’s make a new portfolio listing the net profit/loss from the event.

Normally, we’d track this in a spreadsheet. Let’s learn to think with linear algebra:

The input vector could be (\$Apple, \$Google, \$Microsoft), showing the dollars in each stock. (Oh! These dollar values could come from another matrix that multiplied the number of shares by their price. Fancy that!)
The 4 output operations should be: Update Apple value, Update Google value, Update Microsoft value, Compute Profit

Visualize the problem. Imagine running through each operation:

The key is understanding why we’re setting up the matrix like this, not blindly crunching numbers.

Got it? Let’s introduce the scenario.

Suppose a secret iDevice is launched: Apple jumps 20%, Google drops 5%, and Microsoft stays the same. We want to adjust each stock value, using something similar to the identity matrix:

New Apple     [1.2  0      0]
New Google    [0    0.95   0]
New Microsoft [0    0      1]

The new Apple value is the original, increased by 20% (Google = 5% decrease, Microsoft = no change).

Oh wait! We need the overall profit:

Total change = (.20 * Apple) + (-.05 * Google) + (0 * Microsoft)

Our final operations matrix:

New Apple       [1.2  0      0]
New Google      [0    0.95   0]
New Microsoft   [0    0      1]
Total Profit    [.20  -.05   0]

Making sense? Three inputs enter, four outputs leave. The first three operations are a “modified copy” and the last brings the changes together.

Now let’s feed in the portfolios for Alice \$1000, \$1000, \$1000) and Bob \$500, \$2000, \$500). We can crunch the numbers by hand, or use a Wolfram Alpha (calculation):

(Note: Inputs should be in columns, but it’s easier to type rows. The Transpose operation, indicated by t (tau), converts rows to columns.)

The final numbers: Alice has \$1200 in AAPL, \$950 in GOOG, \$1000 in MSFT, with a net profit of \$150. Bob has \$600 in AAPL, \$1900 in GOOG, and \$500 in MSFT, with a net profit of \$0.

What’s happening? We’re doing math with our own spreadsheet. Linear algebra emerged in the 1800s yet spreadsheets were invented in the 1980s. I blame the gap on poor linear algebra education.

Historical Notes: Solving Simultaneous equations

An early use of tables of numbers (not yet a “matrix”) was bookkeeping for linear systems:

$\begin{aligned} x + 2y + 3z &= 3 \\ 2x + 3y + 1z &= -10 \\ 5x + -y + 2z &= 14 \end{aligned}$

becomes

$\begin{bmatrix}1 & 2 & 3\\2 & 3 & 1\\5 & -1 & 2\end{bmatrix} \begin{bmatrix}x \\y \\ z \end{bmatrix} = \begin{bmatrix}3 \\ -10 \\ 14 \end{bmatrix}$

We can avoid hand cramps by adding/subtracting rows in the matrix and output, vs. rewriting the full equations. As the matrix evolves into the identity matrix, the values of x, y and z are revealed on the output side.

This process, called Gauss-Jordan elimination, saves time. However, linear algebra is mainly about matrix transformations, not solving large sets of equations (it’d be like using Excel for your shopping list).

Terminology, Determinants, and Eigenstuff

Words have technical categories to describe their use (nouns, verbs, adjectives). Matrices can be similarly subdivided.

Descriptions like “upper-triangular”, “symmetric”, “diagonal” are the shape of the matrix, and influence their transformations.

The determinant is the “size” of the output transformation. If the input was a unit vector (representing area or volume of 1), the determinant is the size of the transformed area or volume. A determinant of 0 means matrix is “destructive” and cannot be reversed (similar to multiplying by zero: information was lost).

The eigenvector and eigenvalue represent the “axes” of the transformation.

Consider spinning a globe: every location faces a new direction, except the poles.

An “eigenvector” is an input that doesn’t change direction when it’s run through the matrix (it points “along the axis”). And although the direction doesn’t change, the size might. The eigenvalue is the amount the eigenvector is scaled up or down when going through the matrix.

(My intuition here is weak, and I’d like to explore more. Here’s a nice diagram and video.)

Matrices As Inputs

A funky thought: we can treat the operations matrix as inputs!

Think of a recipe as a list of commands (Add 2 cups of sugar, 3 cups of flour…).

What if we want the metric version? Take the instructions, treat them like text, and convert the units. The recipe is “input” to modify. When we’re done, we can follow the instructions again.

An operations matrix is similar: commands to modify. Applying one operations matrix to another gives a new operations matrix that applies both transformations, in order.

If N is “adjust for portfolio for news” and T is “adjust portfolio for taxes” then applying both:

TN = X

means “Create matrix X, which first adjusts for news, and then adjusts for taxes”. Whoa! We didn’t need an input portfolio, we applied one matrix directly to the other.

The beauty of linear algebra is representing an entire spreadsheet calculation with a single letter. Want to apply the same transformation a few times? Use $N^2$ or $N^3$.

Can We Use Regular Addition, Please?

Yes, because you asked nicely. Our “mini arithmetic” seems limiting: multiplications, but no addition? Time to expand our brains.

Imagine adding a dummy entry of 1 to our input: (x, y, z) becomes (x, y, z, 1).

Now our operations matrix has an extra, known value to play with! If we want x + 1 we can write:

[1 0 0 1]

And x + y - 3 would be:

[1 1 0 -3]

Huzzah!

Want the geeky explanation? We’re pretending our input exists in a 1-higher dimension, and put a “1” in that dimension. We skew that higher dimension, which looks like a slide in the current one. For example: take input (x, y, z, 1) and run it through:

[1 0 0 1]
[0 1 0 1]
[0 0 1 1]
[0 0 0 1]

The result is (x + 1, y + 1, z + 1, 1). Ignoring the 4th dimension, every input got a +1. We keep the dummy entry, and can do more slides later.

Mini-arithmetic isn’t so limited after all.

Onward

I’ve overlooked some linear algebra subtleties, and I’m not too concerned. Why?

These metaphors are helping me think with matrices, more than the classes I “aced”. I can finally respond to “Why is linear algebra useful?” with “Why are spreadsheets useful?”

They’re not, unless you want a tool used to attack nearly every real-world problem. Ask a businessman if they’d rather donate a kidney or be banned from Excel forever. That’s the impact of linear algebra we’ve overlooked: efficient notation to bring spreadsheets into our math equations.

Happy math.

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Understanding Why Complex Multiplication Works

Seeing imaginary numbers as rotations was one of my favorite aha moments:

i, the square root of -1, is a number in a different dimension! Once that clicks, we can use multiplication to "combine" rotations of two complex numbers:

Yowza, did that ever blow my mind: add angles without sine or cosine! Unfortunately, I didn't have an intuitive grasp of why this worked. Let's fix that!

The Boring Explanation: How?

Here's the common explanation of why complex multiplication adds the angles. First, write the complex numbers as polar coordinates (radius & angle):

$\displaystyle{ r1(\cos(a) + i\sin(a)) \cdot r2 (\cos(b) + i\sin(b)) }$

Next, take the product, group by real/imaginary parts:

$\displaystyle{ = (r1 \cos(a) \cdot r2 \cos(b) - r1 \sin(a) \cdot r2 \sin(b))}$ $\displaystyle{ + i (r1 \cos(a) \cdot r2 \sin(b) + r2 \cos(b) \cdot r1 \sin(a)) }$

$\displaystyle{ = r1 \cdot r2[(\cos(a) \cdot \cos(b) - \sin(a) \cdot \sin(b))}$ $\displaystyle{ + i (\cos(a) \cdot \sin(b) + \cos(b) \cdot \sin(a))] }$

Lastly, notice how this matches the sine and cosine angle addition formulas:

$\displaystyle{ \cos(a+b) = \cos(a) \cdot \cos(b) - \sin(a) \cdot \sin(b) }$ $\displaystyle{ \sin(a+b) = \cos(a) \cdot \sin(b) + \cos(b) \cdot \sin(a) }$

And there you have it! What's that? You don't intuitively think in terms of sine and cosine expansions? Too bad, the math checks out!

...

Still here? Good. The problem is we've lost the magic: it's like saying two poems are similar because we analyzed the distribution of letters. Accurate but unsatisfying!

I like sine as much as anyone, but the details come after seeing the relationship click.

The Fun Explanation: Why!

What's our goal again? Oh yes -- to see why we can multiply two complex numbers and add the angles.

First, let's figure out what multiplication does:

Regular multiplication ("times 2") scales up a number (makes it larger or smaller)
Imaginary multiplication ("times i") rotates you by 90 degrees

And what if we combine the effects in a complex number? Multiplying by (2 + i) means "double your number -- oh, add in a perpendicular rotation".

Quick example: $4 \cdot (3+i) = 4 \cdot 3 + 4 \cdot i = 12 + 4i$

That is, take our original (4), make it 3 times larger (4 * 3) and then add the effect of rotation (+4i). Again, if we wanted only rotation, we'd multiply by "i". If we wanted only scaling we'd multiply by plain old 3. A complex number (a + bi) has both effects.

Visualizing Complex Multiplication

That was easy -- a real number (4) times a complex (3+i). What about two complex numbers ("triangles"), like $(3 + 4i) \cdot (2 + 3i)$?

Now we're talking! I see this as "Make a scaled version of our original triangle (times 2) and add a scaled/rotated triangle (times 3i)". The final endpoint is the new complex number.

But... I love alternate explanations! Here's another:

Instead of grouping the multiplication by triangle, we analyze each part of the FOIL (first, outside, inside, last). Adding each component takes us along a path and ends in the same spot!

But What About the Angles?

Ah yes, the angles. It looks like we're adding the angles, but can we be sure?

Captain Geometry to the rescue! Oh, how I've missed you from 9th grade. Is the result (dotted blue line) at the same angle as plopping the triangles on each other?

In the normal case, we start with a triangle (3 + 4i) and plop on the other (2 + 3i) to get the combined angle.

After the multiplication, we start with a scaled triangle (2x) and plop on another scaled triangle (times 3i). Even though it's larger, similar triangles have the same angles -- they're just bigger (but don't ask about its size, ok?).

We scaled up the original triangle (no change in angle) and "plopped on" another scaled triangle (no change in angle), so the result is the same! I love seeing this come together -- we scale up, rotate out, and boom -- we're at the combined angle. This isn't about "imaginary numbers" -- it's a way to combine triangles without trigonometry!

Side Effects May Include Scaling

Notice how we're making larger copies of our original triangle and adding them together. What's the change in size compared to our starting blue triangle?

Well, let's call our original length "x". Whatever it is, we end up getting a new triangle layered on top, with a size of 2x + 3x (a + bi in general). And from Pythagoras (I love that gentleman) the "real" distance is

$\displaystyle{\sqrt{(ax)^2 + (bx)^2} = \sqrt{x^2(a^2 + b^2)} = x \cdot \sqrt{a^2 + b^2}}$

That is, we take our original distance (x) and scale it by the size of the new triangle (size of a + bi).

If the new triangle is size 1 ($a^2 + b^2 = 1$) then the distance won't change!

A Few Thoughts

I don't hate rigorous proofs -- I hate pretending they're helpful when they're not. Proofs have two goals:

Show that a result is true. This is for mathematicians presenting results -- students rarely question the validity of facts in math class.
Show why a result is true.

Real, satisfying insight comes from playing with analogies and examples -- not reading distilled, minimalist proofs (especially those which appeal to the sine/cosine addition formulas!).

Polya said it well: “When you have satisfied yourself that the theorem is true, you start proving it."

Happy math.

Math

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Intuitive Understanding of Sine Waves

Sine waves confused me. Yes, I can mumble "SOH CAH TOA" and draw lines within triangles. But what does it mean?

I was stuck thinking sine had to be extracted from other shapes. A quick analogy:

You: Geometry is about shapes, lines, and so on.

Alien: Oh? Can you show me a line?

You (looking around): Uh... see that brick, there? A line is one edge of that brick.

Alien: So lines are part of a shape?

You: Sort of. Yes, most shapes have lines in them. But a line is a basic concept on its own: a beam of light, a route on a map, or even--

Alien: Bricks have lines. Lines come from bricks. Bricks bricks bricks.

Most math classes are exactly this. "Circles have sine. Sine comes from circles. Circles circles circles."

Argh! No - circles are one example of sine. In a sentence: Sine is a natural sway, the epitome of smoothness: it makes circles "circular" in the same way lines make squares "square".

Let's build our intuition by seeing sine as its own shape, and then understand how it fits into circles and the like. Onward!

Sine vs Lines

Remember to separate an idea from an example: squares are examples of lines. Sine clicked when it became its own idea, not "part of a circle."

Let's observe sine in a simulator:

Hubert will give the tour:

Click start. Go, Hubert go! Notice that smooth back and forth motion? That's Hubert, but more importantly (sorry Hubert), that's sine! It's natural, the way springs bounce, pendulums swing, strings vibrate... and many things move.
Change "vertical" to "linear". Big difference -- see how the motion gets constant and robotic, like a game of pong?

Let's explore the differences with video:

Linear motion is constant: we go a set speed and turn around instantly. It's the unnatural motion in the robot dance (notice the linear bounce with no slowdown vs. the strobing effect).
Sine changes its speed: it starts fast, slows down, stops, and speeds up again. It's the enchanting smoothness in liquid dancing (human sine wave and natural bounce).

Unfortunately, textbooks don't show sine with animations or dancing. No, they prefer to introduce sine with a timeline (try setting "horizontal" to "timeline"):

(source)

Egads. This is the schematic diagram we've always been shown. Does it give you the feeling of sine? Not any more than a skeleton portrays the agility of a cat. Let's watch sine move and then chart its course.

The Unavoidable Circle

Circles have sine. Yes. But seeing the sine inside a circle is like getting the eggs back out of the omelette. It's all mixed together!

Let's take it slow. In the simulation, set Hubert to vertical:none and horizontal: sine*. See him wiggle sideways? That's the motion of sine. There's a small tweak: normally sine starts the cycle at the neutral midpoint and races to the max. This time, we start at the max and fall towards the midpoint. Sine that "starts at the max" is called cosine, and it's just a version of sine (like a horizontal line is a version of a vertical line).

Ok. Time for both sine waves: put vertical as "sine" and horizontal as "sine*". And... we have a circle!

A horizontal and vertical "spring" combine to give circular motion. Most textbooks draw the circle and try to extract the sine, but I prefer to build up: start with pure horizontal or vertical motion and add in the other.

Quick Q & A

A few insights I missed when first learning sine:

Sine really is 1-dimensional

Sine wiggles in one dimension. Really. We often graph sine over time (so we don't write over ourselves) and sometimes the "thing" doing sine is also moving, but this is optional! A spring in one dimension is a perfectly happy sine wave.

(Source: Wikipedia, try not to get hypnotized.)

Circles are an example of two sine waves

Circles and squares are a combination of basic components (sines and lines). The circle is made from two connected 1-d waves, each moving the horizontal and vertical direction.

(Source http://1ucasvb.tumblr.com/)

But remember, circles aren't the origin of sines any more than squares are the origin of lines. They're examples of two sine waves working together, not their source.

What do the values of sine mean?

Sine cycles between -1 and 1. It starts at 0, grows to 1.0 (max), dives to -1.0 (min) and returns to neutral. I also see sine like a percentage, from 100% (full steam ahead) to -100% (full retreat).

What's is the input 'x' in sin(x)?

Tricky question. Sine is a cycle and x, the input, is how far along we are in the cycle.

Let's look at lines:

You're traveling on a square. Each side takes 10 seconds.
After 1 second, you are 10% complete on that side
After 5 seconds, you are 50% complete
After 10 seconds, you finished the side

Linear motion has few surprises. Now for sine (focusing on the "0 to max" cycle):

We're traveling on a sine wave, from 0 (neutral) to 1.0 (max). This portion takes 10 seconds.
After 5 seconds we are... 70% complete! Sine rockets out of the gate and slows down. Most of the gains are in the first 5 seconds
It takes 5 more seconds to get from 70% to 100%. And going from 98% to 100% takes almost a full second!

Despite our initial speed, sine slows so we gently kiss the max value before turning around. This smoothness makes sine, sine.

For the geeks: Press "show stats" in the simulation. You'll see the percent complete of the total cycle, mini-cycle (0 to 1.0), and the value attained so far. Stop, step through, and switch between linear and sine motion to see the values.

Quick quiz: What's further along, 10% of a linear cycle, or 10% of a sine cycle? Sine. Remember, it barrels out of the gate at max speed. By the time sine hits 50% of the cycle, it's moving at the average speed of linear cycle, and beyond that, it goes slower (until it reaches the max and turns around).

So x is the 'amount of your cycle'. What's the cycle?

It depends on the context.

Basic trig: 'x' is degrees, and a full cycle is 360 degrees
Advanced trig: 'x' is radians (they are more natural!), and a full cycle is going around the unit circle (2*pi radians)

Play with values of x here:

But again, cycles depend on circles! Can we escape their tyranny?

Pi without Pictures

Imagine a sightless alien who only notices shades of light and dark. Could you describe pi to it? It's hard to flicker the idea of a circle's circumference, right?

Let's step back a bit. Sine is a repeating pattern, which means it must... repeat! It goes from 0, to 1, to 0, to -1, to 0, and so on.

Let's define pi as the time sine takes from 0 to 1 and back to 0. Whoa! Now we're using pi without a circle too! Pi is a concept that just happens to show up in circles:

Sine is a gentle back and forth rocking
Pi is the time from neutral to max and back to neutral
n * Pi (0 * Pi, 1 * pi, 2 * pi, and so on) are the times you are at neutral
2 * Pi, 4 * pi, 6 * pi, etc. are full cycles

Aha! That is why pi appears in so many formulas! Pi doesn't "belong" to circles any more than 0 and 1 do -- pi is about sine returning to center! A circle is an example of a shape that repeats and returns to center every 2*pi units. But springs, vibrations, etc. return to center after pi too!

Question: If pi is half of a natural cycle, why isn't it a clean, simple number?

Let's answer a question with a question. Why does a 1x1 square have a diagonal of length $\sqrt{2} = 1.414...$ (an irrational number)?

It's philosophically inconvenient when nature doesn't line up with our number system. I don't have a good intuition. My hunch is simple rules (1x1 square + Pythagorean Theorem) can still lead to complex outcomes.

How fast is sine?

I've been tricky. Previously, I said "imagine it takes sine 10 seconds from 0 to max". And now it's pi seconds from 0 to max back to 0? What gives?

sin(x) is the default, off-the-shelf sine wave, that indeed takes pi units of time from 0 to max to 0 (or 2*pi for a complete cycle)
sin(2x) is a wave that moves twice as fast
sin(0.5x) is a wave that moves twice as slow

So, we use sin(n*x) to get a sine wave cycling as fast as we need. Often, the phrase "sine wave" is referencing the general shape and not a specific speed.

Part 2: Understanding the definitions of sine

That's a brainful -- take a break if you need it. Hopefully, sine is emerging as its own pattern. Now let's develop our intuition by seeing how common definitions of sine connect.

Definition 1: The height of a triangle / circle!

Sine was first found in triangles. You may remember "SOH CAH TOA" as a mnemonic

SOH: Sine is Opposite / Hypotenuse
CAH: Cosine is Adjacent / Hypotenuse
TOA: Tangent is Opposite / Adjacent

For a right triangle with angle x, sin(x) is the length of the opposite side divided by the hypotenuse. If we make the hypotenuse 1, we can simplify to:

Sine = Opposite
Cosine = Adjacent

And with more cleverness, we can draw our triangles with hypotenuse 1 in a circle with radius 1:

Voila! A circle containing all possible right triangles (since they can be scaled up using similarity). For example:

sin(45) = .707
Lay down a 10-foot pole and raise it 45 degrees. It is 10 * sin(45) = 7.07 feet off the ground
An 8-foot pole would be 8 * sin(45) = 5.65 feet

These direct manipulations are great for construction (the pyramids won't calculate themselves). Unfortunately, after thousands of years we start thinking the meaning of sine is the height of a triangle. No no, it's a shape that shows up in circles (and triangles).

Realistically, for many problems we go into "geometry mode" and start thinking "sine = height" to speed through things. That's fine -- just don't get stuck there.

Definition 2: The infinite series

I've avoided the elephant in the room: how in blazes do we actually calculate sine!? Is my calculator drawing a circle and measuring it?

Glad to rile you up. Here's the circle-less secret of sine:

Sine is acceleration opposite to your current position

Using our bank account metaphor: Imagine a perverse boss who gives you a raise the exact opposite of your current bank account! If you have \$50 in the bank, then your raise next week is \$50. Of course, your income might be \$75/week, so you'll still be earning some money \$75 - \$50 for that week), but eventually your balance will decrease as the "raises" overpower your income.

But never fear! Once your account hits negative (say you're at \$50), then your boss gives a legit \$50/week raise. Again, your income might be negative, but eventually the raises will overpower it.

This constant pull towards the center keeps the cycle going: when you rise up, the "pull" conspires to pull you in again. It also explains why neutral is the max speed for sine: If you are at the max, you begin falling and accumulating more and more "negative raises" as you plummet. As you pass through then neutral point you are feeling all the negative raises possible (once you cross, you'll start getting positive raises and slowing down).

By the way: since sine is acceleration opposite to your current position, and a circle is made up of a horizontal and vertical sine... you got it! Circular motion can be described as "a constant pull opposite your current position, towards your horizontal and vertical center".

Geeking Out With Calculus

Let's describe sine with calculus. Like e, we can break sine into smaller effects:

Start at 0 and grow at unit speed
At every instant, get pulled back by negative acceleration

How should we think about this? See how each effect above changes our distance from center:

Our initial kick increases distance linearly: y (distance from center) = x (time taken)
At any moment, we feel a restoring force of $-x$. We integrate twice to turn negative acceleration into distance:

$\displaystyle{ \iint -x = \frac{-x^3}{3!} }$

Seeing how acceleration impacts distance is like seeing how a raise hits your bank account. The "raise" must change your income, and your income changes your bank account (two integrals "up the chain").

So, after "x" seconds we might guess that sine is "x" (initial impulse) minus $\frac{x^3}{3!}$ (effect of the acceleration):

Something's wrong -- sine doesn't nosedive! With e, we saw that "interest earns interest" and sine is similar. The "restoring force" changes our distance by $\frac{-x^3}{3!}$, which creates another restoring force to consider. Consider a spring: the pull that yanks you down goes too far, which shoots you downward and creates another pull to bring you up (which again goes too far). Springs are crazy!

We need to consider every restoring force:

$y = x$ is our initial motion, which creates a restoring force of impact...
$y = \frac{-x^3}{3!}$ which creates a restoring force of impact...
$y = \frac{x^5}{5!}$ which creates a restoring force of impact...
$y = \frac{-x^7}{7!}$ which creates a restoring force of impact...

Just like e, sine can be described with an infinite series:

$\displaystyle{\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + ... }$

I saw this formula a lot, but it only clicked when I saw sine as a combination of an initial impulse and restoring forces. The initial push (y = x, going positive) is eventually overcome by a restoring force (which pulls us negative), which is overpowered by its own restoring force (which pulls us positive), and so on.

A few fun notes:

Consider the "restoring force" like "positive or negative interest". This makes the sine/e connection in Euler's formula easier to understand. Sine is like e, except sometimes it earns negative interest. There's more to learn here :).
For very small angles, "y = x" is a good guess for sine. We just take the initial impulse and ignore any restoring forces.

The Calculus of Cosine

Cosine is just a shifted sine, and is fun (yes!) now that we understand sine:

Sine: Start at 0, initial impulse of y = x (100%)
Cosine: Start at 1, no initial impulse

So cosine just starts off... sitting there at 1. We let the restoring force do the work:

$\displaystyle{y = 1 - \frac{x^2}{2!}}$

Again, we integrate -1 twice to get $\frac{-x^2}{2!}$. But this kicks off another restoring force, which kicks off another, and before you know it:

$\displaystyle{\cos(x) = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + ...}$

Definition 3: The differential equation

We've described sine's behavior with specific equations. A more succinct way (equation):

$\displaystyle{y'' = -y}$

This beauty says:

Our current position is y
Our acceleration (2nd derivative, or y'') is the opposite of our current position (-y)

Both sine and cosine make this true. I first hated this definition; it's so divorced from a visualization. I didn't realize it described the essence of sine, "acceleration opposite your position".

And remember how sine and e are connected? Well, $e^x$ can be be described by (equation):

$\displaystyle{y'' = y}$

The same equation with a positive sign ("acceleration equal to your position")! When sine is "the height of a circle" it's really hard to make the connection to e.

One of my great mathematical regrets is not learning differential equations. But I want to, and I suspect having an intuition for sine and e will be crucial.

Summing it up

The goal is to move sine from some mathematical trivia ("part of a circle") to its own shape:

Sine is a smooth, swaying motion between min (-1) and max (1). Mathematically, you're accelerating opposite your position. This "negative interest" keeps sine rocking forever.
Sine happens to appear in circles and triangles (and springs, pendulums, vibrations, sound...).
Pi is the time from neutral to neutral in sin(x). Similarly, pi doesn't "belong" to circles, it just happens to show up there.

Let sine enter your mental toolbox (Hrm, I need a formula to make smooth changes...). Eventually, we'll understand the foundations intuitively (e, pi, radians, imaginaries, sine...) and they can be mixed into a scrumptious math salad. Enjoy!

Appendix

Using this approach, Alistair MacDonald made a great tutorial with code to build your own sine and cosine functions.

Topic Reference

Sine (Geometric Definition)

Sine (Series Definition)

Sine (Unit Circle)

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Intuitive Understanding Of Euler’s Formula

Euler's identity seems baffling:

$\displaystyle{e^{i\pi} = -1}$

It emerges from a more general formula:

$\displaystyle{ e^{ix} = \cos(x) + i \sin(x)}$

Yowza -- we're relating an imaginary exponent to sine and cosine! And somehow plugging in pi gives -1? Could this ever be intuitive?

Not according to 1800s mathematician Benjamin Peirce:

It is absolutely paradoxical; we cannot understand it, and we don't know what it means, but we have proved it, and therefore we know it must be the truth.

Argh, this attitude makes my blood boil! Formulas are not magical spells to be memorized: we must, must, must find an insight. Here's mine:

Euler's formula describes two equivalent ways to move in a circle.

That's it? This stunning equation is about spinning around? Yes -- and we can understand it by building on a few analogies:

Starting at the number 1, see multiplication as a transformation that changes the number: $1 \cdot e^{i \pi}$
Regular exponential growth continuously increases 1 by some rate for some time period; imaginary exponential growth continuously rotates 1 for some time period
Growing for "pi" units of time means going pi radians around a circle
Therefore, $e^{i \pi}$ means starting at 1 and rotating pi (halfway around a circle) to get to -1

That's the high-level view, let's dive into the details. By the way, if someone tries to impress you with $e^{i \pi} = -1$, ask them about i to the i-th power. If they can't think it through, Euler's formula is still a magic spell to them.

Update: While writing, I thought a video might help explain the ideas more clearly:

Understanding cos(x) + i * sin(x)

The equals sign is overloaded. Sometimes we mean "set one thing to another" (like x = 3) and others we mean "these two things describe the same concept" (like $\sqrt{-1} = i$).

Euler's formula is the latter: it gives two formulas which explain how to move in a circle. If we examine circular motion using trig, and travel x radians:

cos(x) is the x-coordinate (horizontal distance)
sin(x) is the y-coordinate (vertical distance)

The statement

$\displaystyle{\cos(x) + i \sin(x)}$

is a clever way to smush the x and y coordinates into a single number. The analogy "complex numbers are 2-dimensional" helps us interpret a single complex number as a position on a circle.

When we set x to $\pi$, we're traveling $\pi$ units along the outside of the unit circle. Because the total circumference is $2\pi$, plain old $\pi$ is halfway around, putting us at -1.

Neato: The right side of Euler's formula ($\cos(x) + i \sin(x)$) describes circular motion with imaginary numbers. Now let's figure out how the e side of the equation accomplishes it.

What is Imaginary Growth?

Combining x- and y- coordinates into a complex number is tricky, but manageable. But what does an imaginary exponent mean?

Let's step back a bit. When I see $3^4$, I think of it like this:

3 is the end result of growing instantly (using e) at a rate of ln(3). In other words: $3 = e^{\ln(3)}$
$3^4$ is the same as growing to 3, but then growing for 4x as long. So $3^4 = e^{\ln(3) \cdot 4} = 81$

Instead of seeing numbers on their own, you can think of them as something e had to "grow to". Real numbers, like 3, give an interest rate of ln(3) = 1.1, and that's what e "collects" as it's going along, growing continuously.

Regular growth is simple: it keeps "pushing" a number in the same, real direction it was going. 3 × 3 pushes in the original direction, making it 3 times larger (9).

Imaginary growth is different: the "interest" we earn is in a different direction! It's like a jet engine that was strapped on sideways -- instead of going forward, we start pushing at 90 degrees.

The neat thing about a constant orthogonal (perpendicular) push is that it doesn't speed you up or slow you down -- it rotates you! Taking any number and multiplying by i will not change its magnitude, just the direction it points.

Intuitively, here's how I see continuous imaginary growth rate: "When I grow, don't push me forward or back in the direction I'm already going. Rotate me instead."

But Shouldn't We Spin Faster and Faster?

I wondered that too. Regular growth compounds in our original direction, so we go 1, 2, 4, 8, 16, multiplying 2x each time and staying in the real numbers. We can consider this $e^{\ln(2)x}$, which means grow instantly at a rate of ln(2) for "x" seconds.

And hey -- if our growth rate was twice as fast, 2ln(2) vs ln(2), it would look the same as growing for twice as long (2x vs x). The magic of e lets us swap rate and time; 2 seconds at ln(2) is the same growth as 1 second at 2ln(2).

Now, imagine we have some purely imaginary growth rate (Ri) that rotates us until we reach i, or 90 degrees upward. What happens if we double that rate to 2Ri, will we spin off the circle?

Nope! Having a rate of 2Ri means we just spin twice as fast, or alternatively, spin at a rate of R for twice as long, but we're staying on the circle. Rotating twice as long means we're now facing 180 degrees.

Once we realize that some exponential growth rate can take us from 1 to i, increasing that rate just spins us more. We'll never escape the circle.

However, if our growth rate is complex (a+bi vs Ri) then the real part (a) will grow us like normal, while the imaginary part (bi) rotates us. But let's not get fancy: Euler's formula, $e^{ix}$, is about the purely imaginary growth that keeps us on the circle (more later).

A Quick Sanity Check

While writing, I had to clarify a few questions for myself:

Why use $e^x$, aren't we rotating the number 1?

e represents the process of starting at 1 and growing continuously at 100% interest for 1 unit of time.

When we write e we're capturing that entire process in a single number -- e represents all the whole rigmarole of continuous growth. So really, $e^x$ is saying "start at 1 and grow continuously at 100% for x seconds", and starts from 1 like we want.

But what does i as an exponent do?

For a regular exponent like $3^4$ we ask:

What is the implicit growth rate? We're growing from 1 to 3 (the base of the exponent).
How do we change that growth rate? We scale it by 4x (the power of the exponent).

We can convert our growth into "e" format: our instantaneous rate is ln(3), and we increase it to ln(3) * 4. Again, the power of the exponent (4) just scaled our growth rate.

$\displaystyle{3^4 = e^{\ln(3) \cdot 4} = (e^{\ln(3)})^4}$

When the top exponent is i (as in $3^i$), we just multiply our implicit growth rate by i. So instead of growing at plain old ln(3), we're growing at ln(3) * i.

$\displaystyle{3^i = e^{\ln(3) \cdot i} = (e^{\ln(3)})^i}$

The top part of the exponent modifies the implicit growth rate of the bottom part.

The Nitty Gritty Details

Let's take a closer look. Remember this definition of e:

$\displaystyle{e = e^{100\%} = \lim_{n\to\infty} \left( 1 + \frac{100\%}{n} \right)^n}$

That $\frac{100\%}{n}$ represents the portion interest we earned in each microscopic period. We assumed the interest rate was 100% in the real dimension -- but what if it were 100% in the imaginary direction?

$\displaystyle{e^{100\% \cdot i} = \lim_{n\to\infty} \left( 1 + \frac{100\%\cdot i}{n} \right)^n}$

Now, our newly formed interest adds to us in the 90-degree direction. Surprisingly, this does not change our length -- this is a tricky concept, because it appears to make a triangle where the hypotenuse must be larger. We're dealing with a limit, and the extra distance is within the error margin we specify. This is something I want to tackle another day, but take my word: continuous perpendicular growth will rotate you. This is the heart of sine and cosine, where your change is perpendicular to your current position, and you move in a circle.

We apply i units of growth in infinitely small increments, each pushing us at a 90-degree angle. There is no "faster and faster" rotation - instead, we crawl along the perimeter a distance of |i| = 1 (magnitude of i).

And hey -- the distance crawled around a circle is an angle in radians! We've found another way to describe circular motion!

To get circular motion: Change continuously by rotating at 90-degree angle (aka imaginary growth rate).

So, Euler's formula is saying "exponential, imaginary growth traces out a circle". And this path is the same as moving in a circle using sine and cosine in the imaginary plane.

In this case, the word "exponential" is confusing because we travel around the circle at a constant rate. In most discussions, exponential growth is assumed to have a cumulative, compounding effect.

Some Examples

You don't really believe me, do you? Here's a few examples, and how to think about them intuitively.

Example: $e^i$

Where's the x? Ah, it's just 1. Intuitively, without breaking out a calculator, we know that this means "travel 1 radian along the unit circle". In my head, I see "e" trying to grow 1 at 100% all in the same direction, but i keeps moving the ball and forces "1" to grow along the edge of a circle:

$\displaystyle{e^i = \cos(1) + i \cdot \sin(1) = .5403 + .8415i}$

Not the prettiest number, but there it is. Remember to put your calculator in radian mode when punching this in.

Example: $3^i$

This is tricky -- it's not in our standard format. But remember, $\displaystyle{3^i = 1 \cdot 3^i}$

We want an initial growth of 3x at the end of the period, or an instantaneous rate of ln(3). But, the i comes along and changes that rate of ln(3) to "i * ln(3)":

$\displaystyle{3^i = (e^{\ln(3)})^i = e^{\ln(3)\cdot i}}$

We thought we were going to transform at a regular rate of ln(3), a little faster than 100% continuous growth since e is about 2.718. But oh no, i spun us around: now we're transforming at an imaginary rate which means we're just rotating about. If i was a regular number like 4, it would have made us grow 4x faster. Now we're growing at a speed of ln(3), but sideways.

We should expect a complex number on the unit circle -- there's nothing in the growth rate to increase our size. Solving the equation:

$\displaystyle{3^i = e^{\ln(3) \cdot i} = \cos(\ln(3)) + i \cdot \sin(\ln(3)) = .4548 + .8906i}$

So, rather than ending up "1" unit around the circle (like $e^i$) we end up ln(3) units around.

Example: $i^i$

A few months ago, this would have had me tears. Not today! Let's break down the transformations:

$\displaystyle{i^i = 1 \cdot i^i}$

We start with 1 and want to change it. Like solving $3^i$, what's the instantaneous growth rate represented by i as a base?

Hrm. Normally we'd do ln(x) to get the growth rate needed to reach x at the end of 1 unit of time. But for an imaginary rate? We need to noodle this over.

In order to start with 1 and grow to i we need to start rotating at the outset. How fast? Well, we need to get 90 degrees (pi/2 radians) in 1 unit of time. So our rate is $i \frac{\pi}{2}$. Remember our rate must be imaginary since we're rotating, not growing! Plain old $\frac{\pi}{2}$ is about 1.57 and results in regular growth.

This should make sense: to turn 1.0 to i at the end of 1 unit, we should rotate $\frac{\pi}{2}$ radians (90 degrees) in that amount of time. So, to get "i" we can use $e^{i \frac{\pi}{2}}$.

$\displaystyle{i = e^{i \frac{\pi}{2}}}$

Phew. That describes i as the base. How about the exponent?

Well, the other i tells us to change our rate -- yes, that rate we spent so long figuring out! So rather than rotating at a speed of $i \frac{\pi}{2}$, which is what a base of i means, we transform the rate to:

$\displaystyle{\frac{\pi}{2}i \cdot i = \frac{\pi}{2} \cdot -1 = -\frac{\pi}{2}}$

The i's cancel and make the growth rate real again! We rotated our rate and pushed ourselves into the negative numbers. And a negative growth rate means we're shrinking -- we should expect $i^i$ to make things smaller. And it does:

$\displaystyle{i^i = e^{- \frac{\pi}{2}} \sim .2}$

Tada! (Search "i^i" on Google to use its calculator)

Take a breather: You can intuitively figure out how imaginary bases and imaginary exponents should behave. Whoa.

And as a bonus, you figured out ln(i) -- to make $e^x$ become i, make e rotate $\frac{\pi}{2}$ radians.

$\displaystyle{\ln(i) = i \cdot \frac{\pi}{2}}$

Example: (i^i)^i

A double imaginary exponent? If you insist. First off, we know what our growth rate will be inside the parenthesis:

$\displaystyle{i^i = (e^{\frac{\pi}{2}i})^i = e^{-\frac{\pi}{2}}}$

We get a negative (shrinking) growth rate of -pi/2. And now we modify that rate again by i:

$\displaystyle{(i^i)^i = (e^{-\frac{\pi}{2}})^i = e^{-\frac{\pi}{2}i}}$

And now we have a negative rotation! We're going around the circle a rate of $-\frac{\pi}{2}$ per unit time. How long do we go for? Well, there's an implicit "1" unit of time at the very top of this exponent chain; the implied default is to go for 1 time unit (just like $e = e^1$). 1 time unit gives us a rotation of $-\frac{\pi}{2}$ radians (-90 degrees) or -i!

$\displaystyle{(i^i)^i = -i}$

And, just for kicks, if we squared that crazy result:

$\displaystyle{((i^i)^i)^2 = -1}$

It's "just" twice the rotation: 2 is a regular number so doubles our rotation rate to a full -180 degrees in a unit of time. Or, you can look at it as applying -90 degree rotation twice in a row.

At first blush, these are really strange exponents. But with our analogies we can take them in stride.

Complex Growth

We can have real and imaginary growth at the same time: the real portion scales us up, and the imaginary part rotates us around:

A complex growth rate like (a + bi) is a mix of real and imaginary growth. The real part a, means "grow at 100% for a seconds" and the imaginary part b means "rotate for b seconds". Remember, rotations don't get the benefit of compounding since you keep 'pushing' in a different direction -- rotation adds up linearly.

With this in mind, we can represent any point on any sized circle using (a+bi)! The radius is $e^a$ and the angle is determined by $e^{bi}$. It's like putting the number in the expand-o-tron for two cycles: once to grow it to the right size (a seconds), another time to rotate it to the right angle (b seconds). Or, you could rotate it first and then grow!

Let's say we want to know the growth amount to get to 6 + 8i. This is really asking for the natural log of an imaginary number: how do we grow e to get (6 + 8i)?

Radius: How big of a circle do we need? Well, the magnitude is $\sqrt{6^2 + 8^2} = \sqrt{100} = 10$. Which means we need to grow for ln(10) = 2.3 seconds to reach that amount.
Amount to rotate: What's the angle of that point? We can use arctan to figure it out: atan(8/6) = 53 degrees = .93 radian.
Combine the result: ln(6+8i) = 2.3 + .93i

That is, we can reach the random point (6 + 8i) if we use $e^{2.3 + .93i}$.

Why Is This Useful?

Euler's formula gives us another way to describe motion in a circle. But we could already do that with sine and cosine -- what's so special?

It's all about perspective. Sine and cosine describe motion in terms of a grid, plotting out horizontal and vertical coordinates.

Euler's formula uses polar coordinates -- what's your angle and distance? Again, it's two ways to describe motion:

Grid system: Go 3 units east and 4 units north
Polar coordinates: Go 5 units at an angle of 53.13 degrees

Depending on the problem, polar or rectangular coordinates are more useful. Euler's formula lets us convert between the two to use the best tool for the job. Also, because $e^{ix}$ can be converted to sine and cosine, we can rewrite formulas in trig as variations on e, which comes in very handy (no need to memorize sin(a+b), you can derive it -- more another day). And it's beautiful that every number, real or complex, is a variation of e.

But utility, schmutility: the most important result is the realization that baffling equations can become intuitive with the right analogies. Don't let beautiful equations like Euler's formula remain a magic spell -- build on the analogies you know to see the insights inside the equation.

Happy math.

Appendix

The screencast was fun, and feedback is definitely welcome. I think it helps the ideas pop, and walking through the article helped me find gaps in my intuition.

References:

Brian Slesinsky has a neat presentation on Euler's formula
Visual Complex Analysis has a great discussion on Euler's formula -- see p. 10 in the Google Book Preview
I did a talk on Math and Analogies which explains Euler's Identity more visually:

Topic Reference

Euler's Formula

Euler's Identity

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Intuitive Guide to Angles, Degrees and Radians

It’s an obvious fact that circles should have 360 degrees. Right?

Wrong. Most of us have no idea why there’s 360 degrees in a circle. We memorize a magic number as the “size of a circle” and set ourselves up for confusion when studying advanced math or physics, with their so called “radians”.

“Radians make math easier!” the experts say, without a simple reason why (discussions involving Taylor series are not simple). Today we’ll uncover what radians really are, and the intuitive reason they make math easier.

Where Do Degrees Come From?

Before numbers and language we had the stars. Ancient civilizations used astronomy to mark the seasons, predict the future, and appease the gods (when making human sacrifices, they’d better be on time).

How is this relevant to angles? Well, bub, riddle me this: isn’t it strange that a circle has 360 degrees and a year has 365 days? And isn’t it weird that constellations just happen to circle the sky during the course of a year?

Unlike a pirate, I bet you landlubbers can’t determine the seasons by the night sky. Here’s the Big Dipper (Great Bear) as seen from New York City in 2008 (try any city):

Constellations make a circle every day (video). If you look at the same time every day (midnight), they will also make a circle throughout the year. Here’s a theory about how degrees came to pass:

Humans noticed that constellations moved in a full circle every year
Every day, they were off by a tiny bit (“a degree”)
Since a year has about 360 days, a circle had 360 degrees

But, but… why not 365 degrees in a circle?

Cut ‘em some slack: they had sundials and didn’t know a year should have a convenient 365.242199 degrees like you do.

360 is close enough for government work. It fits nicely into the Babylonian base-60 number system, and divides well (by 2, 3, 4, 6, 10, 12, 15, 30, 45, 90… you get the idea).

Basing Mathematics on the Sun Seems Perfectly Reasonable

Earth lucked out: ~360 is a great number of days to have in a year. But it does seem arbitrary: on Mars we’d have roughly ~680 degrees in a circle, for the longer Martian year. And in parts of Europe they’ve used gradians, where you divide a circle into 400 pieces.

Many explanations stop here saying, “Well, the degree is arbitrary but we need to pick some number.” Not here: we’ll see that the entire premise of the degree is backwards.

Radians Rule, Degrees Drool

A degree is the amount I, an observer, need to tilt my head to see you, the mover. It’s a tad self-centered, don’t you think?

Suppose you saw a friend go running on a large track:

“Hey Bill, how far did you go?”

“Well, I had a really good pace, I think I went 6 or 7 mile–”

“Shuddup. How far did I turn my head to see you move?”

“What?”

“I’ll use small words for you. Me in middle of track. You ran around. How…much…did…I…turn…my…head?”

“Jerk.”

Selfish, right? That’s how we do math! We write equations in terms of “Hey, how far did I turn my head see that planet/pendulum/wheel move?”. I bet you’ve never bothered to think about the pendulum’s feelings, hopes and dreams.

Do you think the equations of physics should be made simple for the mover or observer?

Radians: The Unselfish Choice

Much of physics (and life!) involves leaving your reference frame and seeing things from another’s viewpoint. Instead of wondering how far we tilted our heads, consider how far the other person moved.

Degrees measure angles by how far we tilted our heads. Radians measure angles by distance traveled.

But absolute distance isn’t that useful, since going 10 miles is a different number of laps depending on the track. So we divide by radius to get a normalized angle:

$\displaystyle{\text{Radian} = \frac{\text{distance traveled}}{\text{radius}}}$

You’ll often see this as

$\displaystyle{\theta = \frac{s}{r}}$

or angle in radians (theta) is arc length (s) divided by radius (r).

A circle has 360 degrees or 2pi radians — going all the way around is 2 * pi * r / r. So a radian is about 360 /(2 * pi) or 57.3 degrees.

Now don’t be like me, memorizing this thinking “Great, another unit. 57.3 degrees is so weird.” Because it is weird when you’re still thinking about you!

Moving 1 radian (unit) is a perfectly normal distance to travel. Put another way, our idea of a “clean, 90 degree angle” means the mover goes a very unclean pi/2 units. Think about it — “Hey Bill, can you run 90 degrees for me? What’s that? Oh, yeah, that’d be pi/2 miles from your point of view.” The strangeness goes both ways.

Radians are the empathetic way to do math — a shift from away from head tilting and towards the mover’s perspective.

What’s in a name?

Radians are a count of distance in terms of “radius units”, and I think of “radian” as shorthand for that concept.

Strictly speaking, radians are just a number like 1.5 or 73, and don’t have any units (in the calculation “radians = distance traveled / radius”, we see length is divided by length, so any units would cancel).

But practically speaking, we’re not math robots, and it helps to think of radians as “distance” traveled on a unit circle.

Using Radians

I’m still getting used to thinking in radians. But we encounter the concept of “mover’s distance” quite a bit:

We use “rotations per minute” not “degrees per second” when measuring certain rotational speeds. This is a shift towards the mover’s reference point (“How many laps has it gone?”) and away from an arbitrary degree measure.
When a satellite orbits the Earth, we understand its speed in “miles per hour”, not “degrees per hour”. Now divide by the distance to the satellite and you get the orbital speed in radians per hour.
Sine, that wonderful function, is defined in terms of radians as

$\displaystyle{\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} ...}$

This formula only works when x is in radians! Why? Well, sine is fundamentally related to distance moved, not head-tilting. But we’ll save that discussion for another day.

Radian Example 1: Wheels of the Bus

Let’s try a real example: you have a bus with wheels of radius 2 meters (it’s a monster truck bus). I’ll say how fast the wheels are turning and you say how fast the bus is moving. Ready?

“The wheels are turning 2000 degrees per second”. You’d think:

Ok, the wheels are going 2000 degrees per second. That means it’s turning 2000/360 or 5 and 5/9ths rotations per second. Circumference = 2 * pi * r, so it’s moving, um, 2 * 3.14 * 5 and 5/9ths… where’s my calculator…

Ok. Now imagine a car with wheels of radius 2 meters (also a monster). “The car wheels are turning 6 radians per second”. You’d think:

Radians are distance along a unit circle — we just scale by the real radius to see how far we’ve gone. 6 * 2 = 12 meters per second. Next question.

Wow -- the car was easier to figure out than the bus! No crazy formulas, no pi floating around — just multiply to convert rotational speed to linear speed. All because radians speak in terms of the mover.

The reverse is easy too. Suppose you’re cruising 90 feet per second on the highway (60 miles per hour) on your 24″ inch rims (radius 1 foot). How fast are the wheels turning?

Well, 90 feet per second / 1 foot radius = 90 radians per second.

That was easy. I suspect rappers sing about 24″ rims for this very reason.

Radian Example 2: sin(x)

Time for a beefier example. Calculus is about many things, and one is what happens when numbers get really big or really small.

Choose a number of degrees (x), and put sin(x) into your calculator:

When you make x small, like .01, sin(x) gets small as well. And the ratio of sin(x)/x seems to be about .017 — what does that mean? Even stranger, what does it mean to multiply or divide by a degree? Can you have square or cubic degrees?

Radians to the rescue! Knowing they refer to distance traveled (they’re not just a ratio!), we can interpret the equation this way:

x is how far you traveled along a circle
sin(x) is how high on the circle you are

So sin(x)/x is the ratio of how high you are to how far you’ve gone: the amount of energy that went in an “upward” direction. If you move vertically, that ratio is 100%. If you move horizontally, that ratio is 0%.

When something moves a tiny amount, such as 0 to 1 degree from our perspective, it’s basically going straight up. If you go an even smaller amount, from 0 to .00001 degrees, it’s really going straight up. The distance traveled (x) is very close to the height (sin(x)).

As x shrinks, the ratio gets closer to 100% — more motion is straight up. Radians help us see, intuitively, why sin(x)/x approaches 1 as x gets tiny. We’re just nudging along a tiny amount in a vertical direction. By the way, this also explains why sin(x) ~ x for small numbers.

Sure, you can rigorously prove this using calculus, but the radian intuition helps you understand it.

Remember, these relationships only work when measuring angles with radians. With degrees, you’re comparing your height on a circle (sin(x)) with how far some observer tilted their head (x degrees), and it gets ugly fast.

So What’s the Point?

Degrees have their place: in our own lives, we’re the focal point and want to see how things affect us. How much do I tilt my telescope, spin my snowboard, or turn my steering wheel?

With natural laws, we’re an observer describing the motion of others. Radians are about them, not us. It took me many years to realize that:

Degrees are arbitrary because they’re based on the sun (365 days ~ 360 degrees), but they are backwards because they are from the observer’s perspective.
Because radians are in terms of the mover, equations “click into place”. Converting rotational to linear speed is easy, and ideas like sin(x)/x make sense.

Even angles can be seen from more than one viewpoint, and understanding radians makes math and physics equations more intuitive. Happy math.

Topic Reference

Radian

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Intuitive Arithmetic With Complex Numbers

Imaginary numbers have an intuitive explanation: they “rotate” numbers, just like negatives make a “mirror image” of a number. This insight makes arithmetic with complex numbers easier to understand, and is a great way to double-check your results. Here’s our cheatsheet:

This post will walk through the intuitive meanings.

Complex Variables

In regular algebra, we often say “x = 3″ and all is dandy — there’s some number “x”, whose value is 3. With complex numbers, there’s a gotcha: there’s two dimensions to talk about. When writing

$\displaystyle{z = 3 + 4i}$

we’re saying there’s a number “z” with two parts: 3 (the real part) and 4i (imaginary part). It is a bit strange how “one” number can have two parts, but we’ve been doing this for a while. We often write:

$\displaystyle{y = 3\frac{4}{10} = 3 + .4}$

and it doesn’t bother us that a single number “y” has both an integer part (3) and a fractional part (.4 or 4/10). Y is a combination of the two. Complex numbers are similar: they have their real and imaginary parts “contained” in a single variable (shorthand is often Re and Im).

Unfortunately, we don’t have nice notation like (3.4) to “merge” the parts into a single number. I had an idea to write the imaginary part vertically, in fading ink, but it wasn’t very popular. So we’ll stick to the “a + bi” format.

Measuring Size

Because complex numbers use two independent axes, we find size (magnitude) using the Pythagorean Theorem:

So, a number z = 3 + 4i would have a magnitude of 5. The shorthand for “magnitude of z” is this: |z|

See how it looks like the absolute value sign? Well, in a way, it is. Magnitude measures a complex number’s “distance from zero”, just like absolute value measures a negative number’s “distance from zero”.

Complex Addition and Subtraction

We’ve seen that regular addition can be thought of as “sliding” by a number. Addition with complex numbers is similar, but we can slide in two dimensions (real or imaginary). For example:

Adding (3 + 4i) to (-1 + i) gives 2 + 5i.

Again, this is a visual interpretation of how “independent components” are combined: we track the real and imaginary parts separately.

Subtraction is the reverse of addition — it’s sliding in the opposite direction. Subtracting (1 + i) is the same as adding -1 * (1 + i), or adding (-1 – i).

Complex Multiplication

Here’s where the math gets interesting. When we multiply two complex numbers (x and y) to get z:

Add the angles: angle(z) = angle(x) + angle(y)
Multiply the magnitudes: |z| = |x| * |y|

That is, the angle of z is the sum of the angles of x and y, and the magnitude of z is the product of the magnitudes. Believe it or not, the magic of complex numbers makes the math work out!

Multiplying by the magnitude (size) makes sense — we’re used to that happening in regular multiplication (3 × 4 means you multiply 3 by 4′s size). The reason the angle addition works is more detailed, and we’ll save it for another time. (Curious? Find the sine and cosine addition formulas and compare them to how (a + bi) * (c + di) get multiplied out).

Time for an example: let’s multiply z = 3 + 4i by itself. Before doing all the math, we know a few things:

The resulting magnitude will be 25. z has a magnitude of 5, so |z| * |z| = 25.
The resulting angle will be above 90. 3 + 4i is above 45 degrees (since 3 + 3i would be 45 degrees), so twice that angle will be more than 90.

With our predictions on paper, we can do the math:

$\displaystyle{(3 + 4i) * (3 + 4i) = 9 + 16i^2 + 24i = -7 + 24i}$

Time to check our results:

Magnitude: $\sqrt{(-7 * -7) + (24 * 24)} = \sqrt{625} = 25$, which matches our guess.
Angle: Since -7 is negative and 24i is positive, we know we are going “backwards and up”, which means we’ve crossed 90 degrees (“straight up”). Getting geeky, we compute atan(24/-7) = 106.2 degrees (keeping in mind we’re in quadrant 2). This guess checks out too.

Nice. While we can always do the math out, the intuition about rotations and scaling helps us check the result. If the resulting angle was less than 90 (“forward and up”, for example), or the resulting magnitude not 25, we’d know there was a mistake in our math.

Complex Division

Division is the opposite of multiplication, just like subtraction is the opposite of addition. When dividing complex numbers (x divided by y), we:

Subtract angles angle(z) = angle(x) – angle(y)
Divide by magnitude |z| = |x| / |y|

Sounds good. Now let’s try to do it:

$\displaystyle{\frac{3 + 4i}{1 + i}}$

Hrm. Where to start? How do we actually do the division? Dividing regular algebraic numbers gives me the creeps, let alone weirdness of i (Mister mister! Didya know that 1/i = -i? Just multiply both sides by i and see for yourself! Eek.). Luckily there’s a shortcut.

Introducing Complex Conjugates

Our first goal of division is to subtract angles. How do we do this? Multiply by the opposite angle! This will “add” a negative angle, doing an angle subtraction.

Instead of z = a + bi, think about a number z* = a – bi, called the “complex conjugate”. It has the same real part, but is the “mirror image” in the imaginary dimension. The conjugate or “imaginary reflection” has the same magnitude, but the opposite angle!

So, multiplying by a – bi is the same as subtracting an angle. Neato.

Complex conjugates are indicated by a star (z*) or bar above the number — mathematicians love to argue about these notational conventions. Either way, the conjugate is the complex number with the imaginary part flipped:

$\displaystyle{z = a + bi}$

Note that b doesn’t have to be “negative”. If z = 3 – 4i, then z* = 3 + 4i.

Multiplying By the Conjugate

What happens if you multiply by the conjugate? What is z times z*? Without thinking, think about this:

$\displaystyle{z \cdot z^* = 1 \cdot z \cdot z^*}$

So we take 1 (a real number), add angle(z), and add angle (z*). But this last angle is negative — it’s a subtraction! So our final result should be a real number, since we’ve canceled the angles. The number should be |z|^2 since we scaled by the size twice.

Now let’s do an example: $\displaystyle{(3 + 4i) * (3 - 4i) = 9 - 16i^2 = 25}$

We got a real number, like we expected! The math fans can try the algebra also:

$\displaystyle{(a + bi) * (a - bi) = a^2 + abi - abi -b^2i^2 = a^2 + b^2 }$

Tada! The result has no imaginary parts, and is the magnitude squared. Understanding complex conjugates as a “negative rotation” lets us predict these results in a different way.

Scaling Your Numbers

When multiplying by a conjugate z*, we scale by the magnitude |z*|. To reverse this effect we can divide by |z|, and to actually shrink by |z| we have to divide again. All in all, we have to divide by |z| * |z| to the original number after multiplying by the conjugate.

Show Me The Division!

I’ve been sidestepping the division, and here’s the magic. If we want to do

$\displaystyle{\frac{3 + 4i}{1 + i}}$

We can approach it intuitively:

Rotate by opposite angle: multiply by (1 – i) instead of (1 + i)
Divide by magnitude squared: divide by $|\sqrt{2}|^2 = 2$

The answer, using this approach, is:

$\displaystyle{\frac{3 + 4i}{1 + i} = (3 + 4i) \cdot (1 - i) \cdot \frac{1}{2} = (3 - 4i^2 + 4i - 3i) \cdot \frac{1}{2} = \frac{7}{2} + \frac{1}{2}i}$

The more traditional “plug and chug” method is to multiply top and bottom by the complex conjugate:

$\displaystyle{\frac{3 + 4i}{1 + i} = \frac{3 + 4i}{1 + i} \cdot \frac{1 - i}{1 - i} = \frac{3 - 4i^2 + 4i - 3i}{1 - i^2} = \frac{7 + i}{2}}$

We’re traditionally taught to “just multiply both sides by the complex conjugate” without questioning what complex division really means. But not today.

We know what’s happening: division is subtracting an angle and shrinking the magnitude. By multiplying top and bottom by the conjugate, we subtract by the angle of (1-i), which happens to make the denominator a real number (it’s no coincidence, since it’s the exact opposite angle). We scaled both the top and bottom by the same amount, so the effects cancel. The result is to turn division into a multiplication in the numerator.

Both approaches work (you’re usually taught the second), but it’s nice to have one to double-check the other.

More Math Tricks

Now that we understand the conjugate, there’s a few properties to consider:

$\displaystyle{(x + y)^* = x^* + y^*}$

$\displaystyle{(x \cdot y)^* = x^* \cdot y^*}$

The first should make sense. Adding two numbers and “reflecting” (conjugating) the result, is the same as adding the reflections. Another way to think about it: sliding two numbers then taking the opposite, is the same as sliding both times in the opposite direction.

The second property is trickier. Sure, the algebra may work, but what’s the intuitive explanation?

The result (xy)* means:

Multiply the magnitudes: |x| * |y|
Add the angles and take the conjugate (opposite): angle(x) + angle(y) becomes “-angle(x) + -angle(y)”

And x* times y* means:

Multiply the magnitudes: |x| * |y| (this is the same as above)
Add the conjugate angles: angle(x) + angle(y) = -angle(x) + -angle(y)

Aha! We get the same angle and magnitude in each case, and we didn’t have to jump into the traditional algebra explanation. Algebra is fine, but it isn’t always the most satisfying explanation.

A Quick Example

The conjugate is a way to “undo” a rotation. Think about it this way:

I deposited \$3, \$10, \$15.75 and \$23.50 into my account. What transaction will cancel these out? To find the opposite: add them up, and multiply by -1.
I rotated a line by doing several multiplications: (3 + 4i), (1 + i), and (2 + 10i). What rotation will cancel these out? To find the opposite: multiply the complex numbers together, and take the conjugate of the result.

See the conjugate z* as a way to “cancel” the rotation effects of z, just like a negative number “cancels” the effects of addition. One caveat: with conjugates, you need to divide by |z| * |z| to remove the scaling effects as well.

Closing Thoughts

The math here isn’t new, but I never realized why complex conjugates worked as they did. Why a – bi and not -a + bi? Well, complex conjugates are not a random choice, but a mirror image from the imaginary perspective, with the exact opposite angle.

Seeing imaginary numbers as rotations gives us a new mindset to approach problems; the “plug and chug” formulas can make intuitive sense, even for a strange topic like complex numbers. Happy math.

Topic Reference

Imaginary Number

Math

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

A Visual, Intuitive Guide to Imaginary Numbers

Imaginary numbers always confused me. Like understanding e, most explanations fell into one of two categories:

It’s a mathematical abstraction, and the equations work out. Deal with it.
It’s used in advanced physics, trust us. Just wait until college.

Gee, what a great way to encourage math in kids! Today we’ll assault this topic with our favorite tools:

Focusing on relationships, not mechanical formulas.
Seeing complex numbers as an upgrade to our number system, just like zero, decimals and negatives were.
Using visual diagrams, not just text, to understand the idea.

And our secret weapon: learning by analogy. We’ll approach imaginary numbers by observing its ancestor, the negatives. Here’s your guidebook:

It doesn’t make sense yet, but hang in there. By the end we’ll hunt down i and put it in a headlock, instead of the reverse.

Video Walkthrough:

Really Understanding Negative Numbers

Negative numbers aren’t easy. Imagine you’re a European mathematician in the 1700s. You have 3 and 4, and know you can write 4 – 3 = 1. Simple.

But what about 3-4? What, exactly, does that mean? How can you take 4 cows from 3? How could you have less than nothing?

Negatives were considered absurd, something that “darkened the very whole doctrines of the equations” (Francis Maseres, 1759). Yet today, it’d be absurd to think negatives aren’t logical or useful. Try asking your teacher whether negatives corrupt the very foundations of math.

What happened? We invented a theoretical number that had useful properties. Negatives aren’t something we can touch or hold, but they describe certain relationships well (like debt). It was a useful fiction.

Rather than saying “I owe you 30” and reading words to see if I’m up or down, I can write “-30” and know it means I’m in the hole. If I earn money and pay my debts (-30 + 100 = 70), I can record the transaction easily. I have +70 afterwards, which means I’m in the clear.

The positive and negative signs automatically keep track of the direction — you don’t need a sentence to describe the impact of each transaction. Math became easier, more elegant. It didn’t matter if negatives were “tangible” — they had useful properties, and we used them until they became everyday items. Today you’d call someone obscene names if they didn’t “get” negatives.

But let’s not be smug about the struggle: negative numbers were a huge mental shift. Even Euler, the genius who discovered e and much more, didn’t understand negatives as we do today. They were considered “meaningless” results (he later made up for this in style).

It’s a testament to our mental potential that today’s children are expected to understand ideas that once confounded ancient mathematicians.

Enter Imaginary Numbers

Imaginary numbers have a similar story. We can solve equations like this all day long:

$\displaystyle{x^2 = 9}$

The answers are 3 and -3. But suppose some wiseguy puts in a teensy, tiny minus sign:

$\displaystyle{x^2 = -9}$

Uh oh. This question makes most people cringe the first time they see it. You want the square root of a number less than zero? That’s absurd! (Historically, there were real questions to answer, but I like to imagine a wiseguy.)

It seems crazy, just like negatives, zero, and irrationals (non-repeating numbers) must have seemed crazy at first. There’s no “real” meaning to this question, right?

Wrong. So-called “imaginary numbers” are as normal as every other number (or just as fake): they’re a tool to describe the world. In the same spirit of assuming -1, .3, and 0 “exist”, let’s assume some number i exists where:

$\displaystyle{i^2 = -1}$

That is, you multiply i by itself to get -1. What happens now?

Well, first we get a headache. But playing the “Let’s pretend i exists” game actually makes math easier and more elegant. New relationships emerge that we can describe with ease.

You may not believe in i, just like those fuddy old mathematicians didn’t believe in -1. New, brain-twisting concepts are hard and they don’t make sense immediately, even for Euler. But as the negatives showed us, strange concepts can still be useful.

I dislike the term “imaginary number” — it was considered an insult, a slur, designed to hurt i‘s feelings. The number i is just as normal as other numbers, but the name “imaginary” stuck so we’ll use it.

Visual Understanding of Negative and Complex Numbers

As we saw last time, the equation $x^2 = 9$ really means:

$\displaystyle{1 \cdot x^2 = 9}$

$\displaystyle{1 \cdot x \cdot x = 9}$

What transformation x, when applied twice, turns 1 to 9?

The two answers are “x = 3” and “x = -3”: That is, you can “scale by” 3 or “scale by 3 and flip” (flipping or taking the opposite is one interpretation of multiplying by a negative).

Now let’s think about $x^2 = -1$, which is really

$\displaystyle{1 \cdot x \cdot x = -1}$

What transformation x, when applied twice, turns 1 into -1? Hrm.

We can’t multiply by a positive twice, because the result stays positive
We can’t multiply by a negative twice, because the result will flip back to positive on the second multiplication

But what about… a rotation! It sounds crazy, but if we imagine x being a “rotation of 90 degrees”, then applying x twice will be a 180 degree rotation, or a flip from 1 to -1!

Yowza! And if we think about it more, we could rotate twice in the other direction (clockwise) to turn 1 into -1. This is “negative” rotation or a multiplication by -i:

If we multiply by -i twice, the first multiplication would turn 1 into -i, and the second turns -i into -1. So there’s really two square roots of -1: i and -i.

This is pretty cool. We have some sort of answer, but what does it mean?

i is a “new imaginary dimension” to measure a number
i (or -i) is what numbers “become” when rotated
Multiplying i is a rotation by 90 degrees counter-clockwise
Multiplying by -i is a rotation of 90 degrees clockwise
Two rotations in either direction is -1: it brings us back into the “regular” dimensions of positive and negative numbers.

Numbers are 2-dimensional. Yes, it’s mind bending, just like decimals or long division would be mind-bending to an ancient Roman. (What do you mean there’s a number between 1 and 2?). It’s a strange, new way to think about math.

We asked “How do we turn 1 into -1 in two steps?” and found an answer: rotate it 90 degrees. It’s a strange, new way to think about math. But it’s useful. (By the way, this geometric interpretation of complex numbers didn’t arrive until decades after i was discovered).

Also, keep in mind that having counter-clockwise be positive is a human convention — it easily could have been the other way.

Finding Patterns

Let’s dive into the details a bit. When multiplying negative numbers (like -1), you get a pattern:

1, -1, 1, -1, 1, -1, 1, -1

Since -1 doesn’t change the size of a number, just the sign, you flip back and forth. For some number “x”, you’d get:

x, -x, x, -x, x, -x…

This idea is useful. The number “x” can represent a good or bad hair week. Suppose weeks alternate between good and bad; this is a good week; what will it be like in 47 weeks?

$\displaystyle{x \cdot (-1)^{47} = x \cdot -1 = -x}$

So -x means a bad hair week. Notice how negative numbers “keep track of the sign”: we can throw $(-1)^{47}$ into a calculator without having to count (”Week 1 is good, week 2 is bad… week 3 is good…“). Things that flip back and forth can be modeled well with negative numbers.

Ok. Now what happens if we keep multiplying by $i$?

$\displaystyle{1, i, i^2, i^3, i^4, i^5}$

Very funny. Let’s reduce this a bit:

$1 = 1$ (No questions here)
$i = i$ (Can’t do much)
$i^2 = -1$ (That’s what i is all about)
$i^3 = (i \cdot i) \cdot i = -1 \cdot i = -i$ (Ah, 3 rotations counter-clockwise = 1 rotation clockwise. Neat.)
$i^4 = (i \cdot i) \cdot (i \cdot i) = -1 \cdot -1 = 1$ (4 rotations bring us “full circle”)
$i^5 = i^4 \cdot i = 1 \cdot i = i$ (Here we go again…)

Represented visually:

We cycle every 4th rotation. This makes sense, right? Any kid can tell you that 4 left turns is the same as no turns at all. Now rather than focusing on imaginary numbers ($i$, $i^2$), look at the general pattern:

X, Y, -X, -Y, X, Y, -X, -Y…

Like negative numbers modeling flipping, imaginary numbers can model anything that rotates between two dimensions “X” and “Y”. Or anything with a cyclic, circular relationship — have anything in mind?

‘Cos it’d be a sin if you didn’t. There’ll de Moivre be more in future articles. [Editor’s note: Kalid is in electroshock therapy to treat his pun addiction.]

Understanding Complex Numbers

There’s another detail to cover: can a number be both “real” and “imaginary”?

You bet. Who says we have to rotate the entire 90 degrees? If we keep 1 foot in the “real” dimension and another in the imaginary one, it looks like this:

We’re at a 45 degree angle, with equal parts in the real and imaginary (1 + i). It’s like a hotdog with both mustard and ketchup — who says you need to choose?

In fact, we can pick any combination of real and imaginary numbers and make a triangle. The angle becomes the “angle of rotation”. A complex number is the fancy name for numbers with both real and imaginary parts. They’re written a + bi, where

a is the real part
b is the imaginary part

Not too bad. But there’s one last question: how “big” is a complex number? We can’t measure the real part or imaginary parts in isolation, because that would miss the big picture.

Let’s step back. The size of a negative number is not whether you can count it — it’s the distance from zero. In the case of negatives this is:

$\displaystyle{\text{Size of } \ -x = \sqrt{(-x)^2} = |x|}$

Which is another way to find the absolute value. But for complex numbers, how do we measure two components at 90 degree angles?

It’s a bird… it’s a plane… it’s Pythagoras!

Geez, his theorem shows up everywhere, even in numbers invented 2000 years after his time. Yes, we are making a triangle of sorts, and the hypotenuse is the distance from zero:

$\displaystyle{\text{Size of } \ a + bi = \sqrt{a^2 + b^2}}$

Neat. While measuring the size isn’t as easy as “dropping the negative sign”, complex numbers do have their uses. Let’s take a look.

A Real Example: Rotations

We’re not going to wait until college physics to use imaginary numbers. Let’s try them out today. There’s much more to say about complex multiplication, but keep this in mind:

Multiplying by a complex number rotates by its angle

Let’s take a look. Suppose I’m on a boat, with a heading of 3 units East for every 4 units North. I want to change my heading 45 degrees counter-clockwise. What’s the new heading?

Some hotshot will say “That’s simple! Just take the sine, cosine, gobbledegook by the tangent… fluxsom the foobar… and…“. Crack. Sorry, did I break your calculator? Care to answer that question again?

Let’s try a simpler approach: we’re on a heading of 3 + 4i (whatever that angle is; we don’t really care), and want to rotate by 45 degrees. Well, 45 degrees is 1 + i (perfect diagonal), so we can multiply by that amount!

Here’s the idea:

Original heading: 3 units East, 4 units North = 3 + 4i
Rotate counter-clockwise by 45 degrees = multiply by 1 + i. (Here's why multiplication, not addition, performs the rotation.)

If we multiply them together we get:

$\begin{aligned} (3 + 4i) \cdot (1 + i) &= 3 + 3i + 4i + 4i^2 \\ &= 3 + 7i \hspace{8mm} + 4(-1) \\ &= -1 + 7i \end{aligned}$

So our new orientation is 1 unit West (-1 East), and 7 units North, which you could draw out and follow.

But yowza! We found that out in 10 seconds, without touching sine or cosine. There were no vectors, matrices, or keeping track what quadrant we are in. It was just arithmetic with a touch of algebra to cross-multiply. Imaginary numbers have the rotation rules baked in: it just works.

Even better, the result is useful. We have a heading (-1, 7) instead of an angle (atan(7/-1) = 98.13, keeping in mind we’re in quadrant 2). How, exactly, were you planning on drawing and following that angle? With the protractor you keep around?

No, you’d convert it into cosine and sine (-.14 and .99), find a reasonable ratio between them (about 1 to 7), and sketch out the triangle. Complex numbers beat you to it, instantly, accurately, and without a calculator.

If you’re like me, you’ll find this use mind-blowing. And if you don’t, well, I’m afraid math doesn’t toot your horn. Sorry.

Trigonometry is great, but complex numbers can make ugly calculations simple (like calculating cosine(a+b) ). This is just a preview; later articles will give you the full meal.

Aside: Some people think “Hey, it’s not useful to have North/East headings instead of a degree angle to follow!”

Really? Ok, look at your right hand. What’s the angle from the bottom of your pinky to the top of your index finger? Good luck figuring that out on your own.

With a heading, you can at least say “Oh, it’s X inches across and Y inches up” and have some chance of working with that bearing.

Complex Numbers Aren’t

That was a whirlwind tour of my basic insights. Take a look at the first chart — it should make sense now.

There’s so much more to these beautiful, zany numbers, but my brain is tired. My goals were simple:

Convince you that complex numbers were considered “crazy” but can be useful (just like negative numbers were)
Show how complex numbers can make certain problems easier, like rotations

If I seem hot and bothered about this topic, there’s a reason. Imaginary numbers have been a bee in my bonnet for years — the lack of an intuitive insight frustrated me.

Now that I’ve finally had insights, I’m bursting to share them. But it frustrates me that you’re reading this on the blog of a wild-eyed lunatic, and not in a classroom. We suffocate our questions and “chug through” — because we don’t search for and share clean, intuitive insights. Egad.

But better to light a candle than curse the darkness: here’s my thoughts, and one of you will shine a spotlight. Thinking we’ve “figured out” a topic like numbers is what keeps us in Roman Numeral land.

There’s much more complex numbers: check out the details of complex arithmetic. Happy math.

Epilogue: But they’re still strange!

I know, they’re still strange to me too. I try to put myself in the mind of the first person to discover zero.

Zero is such a weird idea, having “something” represent “nothing”, and it eluded the Romans. Complex numbers are similar — it’s a new way of thinking. But both zero and complex numbers make math much easier. If we never adopted strange, new number systems, we’d still be counting on our fingers.

I repeat this analogy because it’s so easy to start thinking that complex numbers aren’t “normal”. Let’s keep our mind open: in the future they’ll chuckle that complex numbers were once distrusted, even until the 2000’s.

Carl Gauss, the famous mathematician, wrote:

"Hätte man +1, -1, √-1 nicht positiv, negative, imaginäre (oder gar ummögliche) Einheit, sondern etwa directe, inverse, laterale Einheit gennant, so hätte von einer solchen Dunklelheit kaum die Rede sein können."

"If +1, -1, √-1 had not been called a positive, negative, imaginary (or even impossible) unit, but rather a direct, inverse, lateral unit, then there could hardly have been any talk of such obscurity."

If you want more nitty-gritty, check out wikipedia, the Dr. Math discussion, or another argument on why imaginary numbers exist.

Topic Reference

Imaginary Number

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.