Vector Calculus: Understanding the Cross Product

Taking two vectors, we can write every combination of components in a grid:

This completed grid is the outer product, which can be separated into the:

Dot product, the interactions between similar dimensions (x*x, y*y, z*z)
Cross product, the interactions between different dimensions (x*y,y*z, z*x, etc.)

The dot product ($\vec{a} \cdot \vec{b}$) measures similarity because it only accumulates interactions in matching dimensions. It’s a simple calculation with 3 components.

The cross product (written $\vec{a} \times \vec{b}$) has to measure a half-dozen “cross interactions”. The calculation looks complex but the concept is simple: accumulate 6 individual differences for the total difference.

Instead of thinking “When do I need the cross product?” think “When do I need interactions between different dimensions?”.

Area, for example, is formed by vectors pointing in different directions (the more orthogonal, the better). Indeed, the cross product measures the area spanned by two 3d vectors (source):

(The “cross product” assumes 3d vectors, but the concept extends to higher dimensions.)

Did the key intuition click? Let’s hop into the details.

Defining the Cross Product

The dot product represents the similarity between vectors as a single number:

$\displaystyle{\text{dot product} = (a_x, a_y, a_z) \cdot (b_x, b_y, b_z) = a_x b_x + a_y b_y + a_z b_z = \|\vec{a}\| \|\vec{b}\| \cos(\theta)}$

For example, we can say that North and East are 0% similar since $(0, 1) \cdot (1, 0) = 0$. Or that North and Northeast are 70% similar ($\cos(45) = .707$, remember that trig functions are percentages.) The similarity shows the amount of one vector that “shows up” in the other.

Should the cross product, the difference between vectors, be a single number too?

Let’s try. Sine is the percentage difference, so we could write:

$\displaystyle{\text{cross product candidate} = \text{amount of difference} = \|\vec{a}\| \|\vec{b}\| \sin(\theta)}$

Unfortunately, we’re missing some details. Let’s say we’re looking down the x-axis: both y and z point 100% away from us. A number like “100%” tells us there’s a big difference, but we don’t know what it is! We need extra information to tell us “the difference between $\vec{x}$ and $\vec{y}$ is this” and “the difference between $\vec{x}$ and $\vec{z}$ is that“.

So, let’s express the cross product as a vector:

The size of the cross product is the numeric “amount of difference” (with $\sin(\theta)$ as the percentage). By itself, this doesn’t distinguish $\vec{x} \times \vec{y}$ from $\vec{x} \times \vec{z}$.
The direction of the cross product is based on both inputs: it’s the direction orthogonal to both (i.e., favoring neither).

Now $\vec{x} \times \vec{y}$ and $\vec{x} \times \vec{z}$ have different results, each with a magnitude indicating they are “100%” different from $\vec{x}$.

(Should the dot product be a vector result too? Well, we’re tracking the similarity between $\vec{a}$ and $\vec{b}$. The similarity measures the overlap between the original vector directions, which we already have.)

Geometric Interpretation

Two vectors determine a plane, and the cross product points in a direction different from both (source):

Here’s the problem: there’s two perpendicular directions. By convention, we assume a “right-handed system” (source):

If you hold your first two fingers like the diagram shows, your thumb will point in the direction of the cross product. I make sure the orientation is correct by sweeping my first finger from $\vec{a}$ to $\vec{b}$. With the direction figured out, the magnitude of the cross product is $|a| |b| \sin(\theta)$, which is proportional to the magnitude of each vector and the “difference percentage” (sine).

The Cross Product For Orthogonal Vectors

To remember the right hand rule, write the xyz order twice: xyzxyz. Next, find the pattern you’re looking for:

xy => z (x cross y is z)
yz => x (y cross z is x; we looped around: y to z to x)
zx => y

Now, xy and yx have opposite signs because they are forward and backward in our xyzxyz setup.

So, without a formula, you should be able to calculate:

$\displaystyle{\vec{x} \times \vec{y} = (1, 0, 0) \times (0, 1, 0) = (0, 0, 1) = \vec{z}}$

Again, this is because x cross y is positive z in a right-handed coordinate system. I used unit vectors, but we could scale the terms:

$\displaystyle{(3, 0, 0) \times (0, 4, 0) = (0, 0, 12)}$

Calculating The Cross Product

A single vector can be decomposed into its 3 orthogonal parts:

$\displaystyle{ \vec{a} = (a_x, a_y, a_z) = (a_x, 0, 0) + (0, a_y, 0) + (0, 0, a_z)}$

$\displaystyle{ \vec{b} = (b_x, b_y, b_z) = (b_x, 0, 0) + (0, b_y, 0) + (0, 0, b_z)}$

When the vectors are crossed, each pair of orthogonal components (like $a_x \times b_y$) casts a vote for where the orthogonal vector should point. 6 components, 6 votes, and their total is the cross product. (Similar to the gradient, where each axis casts a vote for the direction of greatest increase.)

xy => z and yx => -z (assume $\vec{a}$ is first, so xy means $a_x b_y$)
yz => x and zy => -x
zx => y and xz => -y

xy and yx fight it out in the z direction. If those terms are equal, such as in $(2, 1, 0) \times (2, 1, 1)$, there is no cross product component in the z direction (2 – 2 = 0).

The final combination is:

$\displaystyle{(a_x, a_y, a_z) \times (b_x, b_y, b_z) = (a_y b_z - a_z b_y, a_z b_x - a_x b_z, a_x b_y - a_y b_x) = \|a\| \|b\| \sin(\theta) \vec{n}}$

where $\vec{n}$ is the unit vector normal to $\vec{a}$ and $\vec{b}$.

Don’t let this scare you:

There’s 6 terms, 3 positive and 3 negative
Two dimensions vote on the third (so the z term must only have y and x components)
The positive/negative order is based on the xyzxyz pattern

If you like, there is an algebraic proof, that the formula is both orthogonal and of size $|a| |b| \sin(\theta)$, but I like the “proportional voting” intuition.

Example Time

Again, we should do simple cross products in our head:

$\displaystyle{(1, 0, 0) \times (0, 1, 0) = (0, 0, 1)}$

Why? We crossed the x and y axes, giving us z (or $\vec{i} \times \vec{j} = \vec{k}$, using those unit vectors). Crossing the other way gives $-\vec{k}$.

Here’s how I walk through more complex examples:

$\displaystyle{(1, 2, 3) \times (4, 5, 6) = ?}$

Let’s do the last term, the z-component. That’s (1)(5) minus (4)(2), or 5 – 8 = -3. I did z first because it uses x and y, the first two terms. Try seeing (1)(5) as “forward” as you scan from the first vector to the second, and (4)(2) as backwards as you move from the second vector to the first.
Now the y component: (3)(4) – (6)(1) = 12 – 6 = 6
Now the x component: (2)(6) – (5)(3) = 12 – 15 = -3

So, the total is $(-3, 6, -3)$ which we can verify with Wolfram Alpha.

In short:

The cross product tracks all the “cross interactions” between dimensions
There are 6 interactions (2 in each dimension), with signs based on the xyzxyz order

Appendix

Connection with the Determinant

You can calculate the cross product using the determinant of this matrix:

$\mathbf{u\times v}=\begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k}\\ u_1 & u_2 & u_3\\ v_1 & v_2 & v_3\\ \end{vmatrix}$

$\mathbf{u\times v}= \begin{vmatrix} u_2 & u_3\\ v_2 & v_3 \end{vmatrix}\mathbf{i} -\begin{vmatrix} u_1 & u_3\\ v_1 & v_3 \end{vmatrix}\mathbf{j} +\begin{vmatrix} u_1 & u_2\\ v_1 & v_2 \end{vmatrix}\mathbf{k}$

There’s a neat connection here, as the determinant (“signed area/volume”) tracks the contributions from orthogonal components.

There are theoretical reasons why the cross product (as an orthogonal vector) is only available in 0, 1, 3 or 7 dimensions. However, the cross product as a single number is essentially the determinant (a signed area, volume, or hypervolume as a scalar).

Connection with Curl

Curl measures the twisting force a vector field applies to a point, and is measured with a vector perpendicular to the surface. Whenever you hear “perpendicular vector” start thinking “cross product”.

We take the “determinant” of this matrix:

$\begin{vmatrix} \vec{i} & \vec{j} & \vec{k} \\ \\ {\frac{\partial}{\partial x}} & {\frac{\partial}{\partial y}} & {\frac{\partial}{\partial z}} \\ \\ F_x & F_y & F_z \end{vmatrix}$

$\displaystyle{\nabla \times \vec{F} = \left(\frac{\partial F_z}{\partial y} - \frac{\partial F_y}{\partial z}\right) \vec{i} + \left(\frac{\partial F_x}{\partial z} - \frac{\partial F_z}{\partial x}\right) \vec{j} + \left(\frac{\partial F_y}{\partial x} - \frac{\partial F_x}{\partial y}\right) \vec{k}}$

Instead of multiplication, the interaction is taking a partial derivative. As before, the $\vec{i}$ component of curl is based on the vectors and derivatives in the $\vec{j}$ and $\vec{k}$ directions.

Relation to the Pythagorean Theorem

The cross and dot product are like the orthogonal sides of a triangle:

$\displaystyle{a^2 + b^2 = c^2 }$

For unit vectors, where $|a| = |b| = 1 $, we have:

$\displaystyle{ \|\text{dot product}\|^2 + \|\text{cross product}\|^2 = \cos^2 + \sin^2 = 1}$

I cheated a bit in the grid diagram, as we have to track the squared magnitudes (as done in the Pythagorean Theorem).

Advanced Math

The cross product & friends get extended in Clifford Algebra and Geometric Algebra. I’m still learning these.

Cross Products of Cross Products

Sometimes you’ll have a scenario like:

$\displaystyle{\vec{a} \times \vec{b} \times \vec{c} = ? }$

First, the cross product isn’t associative: order matters.

Next, remember what the cross product is doing: finding orthogonal vectors. If any two components are parallel ($\vec{a}$ parallel to $\vec{b}$) then there are no dimensions pushing on each other, and the cross product is zero (which carries through to $0 \times \vec{c}$).

But it’s ok for $\vec{a}$ and $\vec{c}$ to be parallel, since they are never directly involved in a cross product, for example:

$\displaystyle{\vec{i} \times \vec{j} \times \vec{i} = \vec{k} \times \vec{i} = \vec{j} }$

Whoa! How’d we get back to $\vec{j}$? We asked for a direction perpendicular to both $\vec{i}$ and $\vec{j}$, and made that direction perpendicular to $\vec{i}$ again. Being “doubly perpendicular” means you’re back on the original axis.

Dot Product of Cross Products

Now if we take

$\displaystyle{\vec{a} \times \vec{b} \cdot \vec{c} = ? }$

what happens? We’re forced to do $\vec{a} \times \vec{b}$ first, because $\vec{b} \cdot \vec{c}$ returns a scalar (single number) which can’t be used in a cross product.

If $\vec{a}$ and $\vec{c}$ are parallel, what happens? Well, $\vec{a} \times \vec{b}$ is perpendicular to $\vec{a}$, which means it’s perpendicular to $\vec{c}$, so the dot product with $\vec{c}$ will be zero.

I never really memorized these rules, I have to think through the interactions.

Other Coordinate Systems

The Unity game engine is left-handed, OpenGL (and most math/physics tools) are right-handed. Why?

In a computer game, x goes horizontal, y goes vertical, and z goes “into the screen”. This results in a left-handed system. (Try it: using your right hand, you can see x cross y should point out of the screen).

Applications of the Cross Product

Find the direction perpendicular to two given vectors.
Find the signed area spanned by two vectors.
Determine if two vectors are orthogonal (checking for a dot product of 0 is likely faster though).
“Multiply” two vectors when only perpendicular cross-terms make a contribution (such as finding torque).
With the quaternions (4d complex numbers), the cross product performs the work of rotating one vector around another (another article in the works!).

Happy math.

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Vector Calculus: Understanding the Dot Product

I think of the dot product as directional multiplication. Multiplication goes beyond repeated counting: it's applying the essence of one item to another. (For example, complex multiplication is rotation, not repeated counting.)

When dealing with simple growth rates, multiplication scales one rate by another:

"3 x 4" can mean "Take your 3x growth and make it 4x as large, to get 12x"

When dealing with vectors ("directional growth"), there's a few operations we can do:

Add vectors: Accumulate the growth contained in several vectors.
Multiply by a constant: Make an existing vector stronger (in the same direction).
Dot product: Apply the directional growth of one vector to another. The result is how much stronger we've made the original vector (positive, negative, or zero).

Today we'll build our intuition for how the dot product works.

Getting the Formula Out of the Way

You've seen the dot product equation everywhere:

$\displaystyle{\vec{a} \cdot \vec{b} = a_x \cdot b_x + a_y \cdot b_y = |\vec{a}||\vec{b}|\cos(\theta) }$

And also the justification: "Well Billy, the Law of Cosines (you remember that, don't you?) says the following calculations are the same, so they are." Not good enough -- it doesn't click! Beyond the computation, what does it mean?

The goal is to apply one vector to another. The equation above shows two ways to accomplish this:

Rectangular perspective: combine x and y components
Polar perspective: combine magnitudes and angles

The "this stuff = that stuff" equation just means "Here are two equivalent ways to 'directionally multiply' vectors".

Seeing Numbers as Vectors

Let's start simple, and treat 3 x 4 as a dot product:

$\displaystyle{(3, 0) \cdot (4,0)}$

The number 3 is "directional growth" in a single dimension (the x-axis, let's say), and 4 is "directional growth" in that same direction. 3 x 4 = 12 means we get 12x growth in a single dimension. Ok.

Now, suppose 3 and 4 refer to different dimensions. Let's say 3 means "triple your bananas" (x-axis) and 4 means "quadruple your oranges" (y-axis). Now they're not the same type of number: what happens when apply growth (use the dot product) in our "bananas, oranges" universe?

(3,0) means "Triple your bananas, destroy your oranges"
(0,4) means "Destroy your bananas, quadruple your oranges"

Applying (0,4) to (3,0) means "Destroy your banana growth, quadruple your orange growth". But (3, 0) had no orange growth to begin with, so the net result is 0 ("Destroy all your fruit, buddy").

$\displaystyle{(3, 0) \cdot (0, 4) = 0}$

See how we're "applying" and not simply adding? With regular addition, we smush the vectors together: (3,0) + (0, 4) = (3, 4) [a vector which triples your oranges and quadruples your bananas].

"Application" is different. We're mutating the original vector based on the rules of the second. And the rules of (0, 4) are "Destroy your banana growth, and quadruple your orange growth." When applied to something with only bananas, like (3, 0), we're left with nothing.

The final result of the dot product process can be:

Zero: we don't have any growth in the original direction
Positive number: we have some growth in the original direction
Negative number: we have negative (reverse) growth in the original direction

Understanding the Calculation

"Applying vectors" is still a bit abstract. I think "How much energy/push is one vector giving to the other?". Here's how I visualize it:

Rectangular Coordinates: Component-by-component overlap

Like multiplying complex numbers, see how each x- and y-component interacts:

We list out all four combinations (x with x, y with x, x with y, y with y). Since the x- and y-coordinates don't affect each other (like holding a bucket sideways under a waterfall -- nothing falls in), the total energy absorbtion is absorbtion(x) + absorbtion(y):

$\displaystyle{\vec{a} \cdot \vec{b} = a_x \cdot b_x + a_y \cdot b_y}$

Polar coordinates: Projection

The word "projection" is so sterile: I prefer "along the path". How much energy is actually going in our original direction?

Here's one way to see it:

Take two vectors, a and b. Rotate our coordinates so b is horizontal: it becomes (|b|, 0), and everything is on this new x-axis. What's the dot product now? (It shouldn't change just because we tilted our head).

Well, vector a has new coordinates (a1, a2), and we get:

$\displaystyle{a1 \cdot |\vec{b}| + a2 \cdot 0 = a1 \cdot |\vec{b}|}$

a1 is really "What is the x-coordinate of a, assuming b is the x-axis?". That is |a|cos(θ), aka the "projection":

$\displaystyle{\vec{a} \cdot \vec{b} = |\vec{a}|\cos(\theta)|\vec{b}|}$

Analogies for the Dot Product

The common interpretation is "geometric projection", but it's so bland. Here's some analogies that click for me:

Energy Absorbtion

One vector are solar rays, the other is where the solar panel is pointing (yes, yes, the normal vector). Larger numbers mean stronger rays or a larger panel. How much energy is absorbed?

Energy = Overlap in direction * Strength of rays * Size of panel
$\displaystyle{\text{Energy} = \cos(\theta) \cdot |a| \cdot |b|}$

If you hold your panel sideways to the sun, no rays hit (cos(θ) = 0).

Photo credit

But... but... solar rays are leaving the sun, and the panel is facing the sun, and the dot product is negative when vectors are opposed! Take a deep breath, and remember the goal is to embrace the analogy (besides, physicists lose track of negative signs all the time).

Mario-Kart Speed Boost

In Mario Kart, there are "boost pads" on the ground that increase your speed (Never played? I'm sorry.)

Photo source

Imagine the red vector is your speed (x and y direction), and the blue vector is the orientation of the boost pad (x and y direction). Larger numbers are more power.

How much boost will you get? For the analogy, imagine the pad gives a speed bonus like this:

If you come in going 0, you'll get nothing. (If you are dropped onto the pad, there's no boost.)
If you cross the pad perpendicularly, you'll get 0 benefit. (Just like the banana obliteration, there's 0x boost in the perpendicular direction.)
If our direction and pad are aligned, our x-speed contributes an x-boost, and our y-speed gives us a y-boost:

$\displaystyle{\text{boost effect} = speed_x \cdot boost_x + speed_y \cdot boost_y}$

Neat, eh? Another way to see it: your incoming speed is $|a|$, and the max boost is $|b|$. The percentage of boost you actually get (based on how you're lined up) is $\cos(\theta)$, for an overal boost of $|a||b|\cos(\theta)$, which is the dot product.

Fruit Stand Analogy

Let's say your store sells apples, bananas, and clementines. They cost \$1, \$2, and \$3 each, respectively.

A customer wants to buy 2 apples, 3 bananas, and 4 clementines. What does it cost?

cost = (A quantity) * (A price) + (B quantity) * (B price) + (C quantity) * (C price) 
cost = 2*1 + 3*2 + 4*3 = 20

This is the dot product between the "quantity" vector and the "price" vector! We're multiplying the matching entries and getting the total. We ignore entries that don't "make sense" to multiply (why should the banana quantity and clementine price impact each other?).

Physics Physics Physics

The dot product appears all over physics: some field (electric, gravitational) is pulling on some particle. We'd love to multiply, and we could if everything were lined up. But that's never the case, so we take the dot product to account for potential differences in direction.

It's all a useful generalization: Integrals are "multiplication, taking changes into account" and the dot product is "multiplication, taking direction into account".

And what if your direction is changing? Why, take the integral of the dot product, of course!

Onward and Upward

Don't settle for "Dot product is the geometric projection, justified by the law of cosines". Find the analogies that click for you! Happy math.

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Understanding Pythagorean Distance and the Gradient

The Pythagorean Theorem shows how strange our concept of distance is. Using the rule $a^2 + b^2 = c^2$, we can trade some "a" to get more "b".

Starting with

$\displaystyle{13^2 + 0^2 = 13^2}$

means "A 13-inch pizza equals a 13-inch pizza". Sure. But we can trade an inch and get:

$\displaystyle{12^2 + 5^2 = 13^2}$

Huh? A 12-inch pizza and a 5-inch pizza equal a 13-inch pizza?

The math works (144 + 25 = 169) but, but... we gave up an inch and got a five-inch pizza!

Let's understand why the tradeoff happens, and how to use it.

Explanation 1: Shaving the Square

A key insight: Bigger numbers are harder to square.

Imagine laying tiles on a porch -- as your porch grows, the outer layer needs more tiles. Trimming a 13x13 porch to 12x12 frees up 25 tiles, which is enough to make a new 5x5 porch!

I call this "shaving the square". Trimming 1 unit from the outside of a large square has more "shavings" which can contribute to a smaller one (trimming an inch from a giant fro can make a sweater for an infant). As we continue to trim, the benefit diminishes because our starting point is smaller and smaller.

Explanation 2: Sliding the Chopstick

A second insight: Slide a little, pivot a lot.

Imagine a chopstick wedged in a corner: the length is fixed, and the ends of the chopstick must touch a wall. What're the options?

Well, laying on a single wall means 100% for one side (like saying $13^2 + 0^2 = 13^2$). Not that interesting.

By sliding the chopstick (from 13 to 12) we can swing it out by 5 on the other wall!

You need to try it -- a small slide gives a giant pivot. As we keep sliding, the tradeoff (How much pivot do we get?) changes.

So What's the Tradeoff?

Time to see how the a/b tradeoff works. First, let's use grid coordinates: x & y (horizontal and vertical). Given a fixed distance (13 units, let's say), our options lay on the circle where $x^2 + y^2 = 13^2$:

A few points:

Each possibility is the same distance, but has a different ratio of x to y (100% x, 100% y, or a mix like (12,5))
We can only move to neighboring points on the circle (options at the same distance)
The tradeoff we face is how much "x" we get for "y" when moving to a neighbor. If we're at (0, 13) we could move to (5, 12). This trades 1 y for 5 x's.

This is the "chunky" tradeoff where we're using an entire unit at a time. What about .5 units? .01?

Enter the tangent! The tangent line shows the trajectory of our current path, the direction to our neighbor. We follow the tangent for a tiny, microscopic amount to get our next neighbor. The tangent is an approximation -- it's not pointing exactly at our nearest neighbor, but it's pretty close.

The tangent shows the tradeoff you are about to make.

What's the actual amount? Any point (x,y) has a slope of y/x, and a tangent line with slope -x/y, so the tradeoff is...getting confused yet?

Less mindless algebra, more intuition:

Circles have a tangent line perpendicular to the current point
If you're at (5,12) then tangent slope is some ratio of 5 and 12
Remember "shaving the square": you get a better deal in the direction of the smaller coordinate (increasing a large square is tough).
So, at (5, 12) you're "heavy on the y" and the trade will favor improving your x: it should be "trade 5 y's for 12 x's". And why not the other way? It doesn't make sense that the more y you have, the easier it is to get y! That'd spiral off into exponential growth, not a circle.
Lastly, we can't trade an entire chunk of 5 y's! The tangent is about our nearest neighbor. We have a trade of 12/5 or 2.4 to 1. Our next, tiny movement will be at this ratio (and then we'll be at a new point, with a new tangent).

General principle: Our neighbors are on a circle, which encourages balance. You get a better deal in the direction of the smaller coordinate: at (x,y) the tradeoff is y:x.

Optimizing The Tradeoff

Now we know the tradeoff for any point (x,y) -- let's optimize!

In a boring scenario, we get paid based on pure distance, so every point (or direction to move) is the same.

The exciting scenario: our (x,y) position is an input into some other function which gives us a return! Now we want to maximize that function.

Here's a scenario: Popeye throws cars for cash. He lines up spectators on fences running North and East. The spectators must look straight ahead (they're in neck braces, due to earlier events) but will pay Popeye if they see a car pass in front of them.

Maximizing Even Payouts

Suppose each spectator offers \$1 if they see the car (Payout (x,y) = x + y). Where to throw?

First, assume Popeye has finite energy -- he can throw the car 13 meters. Now let's start somewhere: throwing the car pure North (0, 13):

$\displaystyle{P(0,13) = 0 + 13 = 13}$

Ok. What if he threw it slightly East? To (5, 12) let's say?

$\displaystyle{P(5,12) = 5 + 12 = 17}$

Clearly better. This should make sense: at (0,13) the tradeoff is great to get more East. We can give up 1 North and get a whopping 5 East, a "profit" of \$4 if we do the trade. We should keep trading as long as it's profitable -- as long as we're out of balance, the circle will reward us for boosting the smaller side. Following a 45 degree angle for 13 units is the ideal:

$\displaystyle{P(13 \cdot \frac{1}{\sqrt{2}}, 13 \cdot \frac{1}{\sqrt{2}}) = P(13 \cdot .707, 13 \cdot .707) = 9.2 + 9.2 = 18.4}$

Neat. A 45-degree throw hits 70.7% of the possible spectators for each side.

Psst. Confused about how a 45-degree through passes by 70.7% of the spectators on each side? No problem.

A 45-degree throw is along the diagonal of a square. A triangle with sides 1 and 1 has a hypotenuse of:

$\displaystyle{\sqrt{1^2 + 1^2} = \sqrt{2} = 1.414}$

And has sides $(1, 1, 1.414)$.

A hypotenuse of $\sqrt{2}$ isn't convenient: it's hard to know what fraction a side is of the whole. We divide the triangle by the length of the hypotenuse ($\sqrt{2}$), making the hypotenuse 1 and the other sides a percentage:

$\displaystyle{\text{Triangle with sides} = (\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}, \frac{\sqrt{2}}{\sqrt{2}}) = (.707, .707, 1)}$

Now we've discovered that a 45-degree throw, with sides $(1, 1, \sqrt{2})$, has the ratio $.707, .707, 1$. 70.7% of the distance along the hypotenuse shows up on each side.

General Technique: Finding the Best Direction

We stumbled upon the way to find the best return:

Pick any starting point / direction
Tweak it: if our return improves, keep the new choice (it's profitable)
Keep tweaking until our return is no longer profitable

In math slang, this is "finding the local maximum". In economics slang, it's finding the point of "zero marginal returns". Popeye calls it Squeezing the Spinach.

Maximizing Uneven Returns

Now suppose the Northern spectators offer \$2 (Eastern stay at \$1), so P(x,y) = x + 2*y. Should we throw it 100% North?

$\displaystyle{ P(0, 13) = 0 + 2 \cdot 13 = 26 }$

Not bad. But what about 45 degrees again?

$\displaystyle{P(9.2, 9.2) = 9.2 + 2 \cdot 9.2 = 27.6 }$

Interesting -- 45 degrees is still better! But... I think we went too far! Shouldn't we favor North since it pays more?

Yep. Let's remember how to Squeeze the Spinach (maximize our returns): start with North and change until it's not profitable:

The payout function means 1 North = 2 Easts (North pays \$2, so 1 unit North = 2 units East)
Trades are profitable if we can beat 1 North for 2 Easts (1 North for 3 Easts, for example, would profit \$1)

So... where are trades better than 1 North for 2 Easts? In the Northern section, where the circle rewards us by throwing Easts at us ("Please, please go East... I'll give you a bunch if you give up a little North").

Remember how circles are about x/y, x & y, x:y, etc.? Well, we have the numbers 1 and 2. (2,1) is in the East section. We want (1,2). Why? At (1,2) we have reached the perfect 1 North = 2 East tradeoff.

Following the direction (1,2) for 13 units is:

$\displaystyle{P(13 \cdot \frac{1}{\sqrt{5}}, 13 \cdot \frac{2}{\sqrt{5}}) = P(5.81, 11.62) = 5.81 + 2 \cdot 11.62 = 29.05 }$

Tada! Over 29 smackeroos because we maximized our return.

The Gradient Principle

We can supercharge this result:

To maximize return, go in each direction proportional to its payoff.

If North pays 2:1 compared to East, your trajectory should favor North by 2:1. In mathier terms:

Payoff(x,y) = ax + by
Best trajectory = (a, b) [in our case, (East, North) => (1, 2)]

And this works in multiple dimensions! Given 3 dimensions, go in a direction (Payoff(x), Payoff(y), Payoff(z)). Vector calculus fans, this is why the gradient is in the direction of greatest increase.

The gradient for $F(x,y,z)$ is

$\displaystyle{(\frac{dF}{dx},\frac{dF}{dy},\frac{dF}{dz})}$

And each partial derivative (dF/dx) is the payoff for moving in that direction.

But does it all balance? Suppose x pays 3, y pays 4, and z pays 5 (at the current position). The 2-dimensional tradeoff trajectories are:

$\displaystyle{ (x, y) = (3,4) }$ $\displaystyle{ (y, z) = (4, 5) }$ $\displaystyle{ (x, z) = (3, 5) }$

Now for the magic: the combined trajectory

$\displaystyle{(x,y,z) = (3,4,5)}$

satisfies all 3 requirements! On the x-z plane, x doesn't care about y -- as long as the ratio to z is (3 , ?, 5) you're getting the best tradeoff from the x-z perspective. The pairs are:

(3, ?, 5)
(?, 4, 5)
(3, 4, ?)

You don't need a sudoku master to see (3, 4, 5) satisfies all those proportions.

Still not convinced? Imagine the payoff for y was zero. We don't want to waste energy in our trajectory (3, ?, 5) in a useless direction. But that can't happen, because the y-z tradeoff will be (?, 0, 5) and the x-y tradeoff will be (3, 0, ?). The x-z tradeoff lets y-z and x-y "figure out" what y should be, which is 0.

Questions I Had That You Might Have Too

Q: I still don't get why this works at all. Somehow 50% in x and 50% in y leads to .7 + .7 = 1.4?

It's a deep question about why space behaves like this. I was going crazy staring at chopsticks on a wall.

Here's my answer: distance is distance. 13 units is 13 units. But in some situations we are "measuring our coordinates" (what are the values of x & y) and not the distance itself.

Cartesian coordinates (x-axis, y-axis) are very inefficient for diagonal motion (i.e., you are measuring the sides of the triangle, not the hypotenuse). When $.707^2 + .707^2 = 1$, it's a measure how how "inefficient" our x & y coordinates are being. We used 70% of each coordinate to represent an object that could have been 100% on one (i.e, if we used polar coordinates).

Q: I have an offshore investment with 200% return, and an onshore one with 5% return. I have \$1000 to spend -- should I split my money?

Heavens, no! Remember, this principle is about distance measurements on a grid with the idea that 50% in x and 50% in y covers "more ground" than 100% in x. In investing 1) money is not on a grid and 2) there's no distance bonus. Putting half your money in each is plain old 0.5 + 0.5 = 1.0. Giving up \$1 of the offshore investment gives you \$1 for the onshore one.

Put all your money in the best investment.

Q: So all this stuff is useless?

Heavens, no! Ask yourself: am I measuring distance on a coordinate system?

Many things are measured in terms of x-y coordinates (physical phenomena, etc.) and do have the Pythagorean distance tradeoff.

But not every graph is the same. Graphs that aren't about distance (like "Money vs. Time") do not get any boost from the Pythagorean theorem. This confused me for a long time: the Pythagorean Theorem works for coordinate distance!

Final Thoughts

The Pythagorean Theorem is so versatile -- it's not about triangles, it covers the nature of distance. I seem to find some new realization when I study it. Really grokking it will help you everywhere, from geometry to vector calculus.

Happy math.

Topic Reference

Pythagorean Theorem

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Vector Calculus: Understanding Circulation and Curl

Circulation is the amount of force that pushes along a closed boundary or path. It's the total "push" you get when going along a path, such as a circle.

A vector field is usually the source of the circulation. If you had a paper boat in a whirlpool, the circulation would be the amount of force that pushed it along as it went in a circle. The more circulation, the more pushing force you have.

Curl is simply the circulation per unit area, circulation density, or rate of rotation (amount of twisting at a single point). Imagine shrinking your whirlpool down smaller and smaller while keeping the force the same: you'll have a lot of power in a small area, so will have a large curl. If you widen the whirlpool while keeping the force the same as before, then you'll have a smaller curl. And of course, zero circulation means zero curl.

Intuition

Circulation is the amount of "pushing" force along a path. Curl is the amount of pushing, twisting, or turning force when you shrink the path down to a single point. Let's use water as an example.

Suppose we have a flow of water and we want to determine if it has curl or not: is there any twisting or pushing force? To test this, we put a paddle wheel into the water and notice if it turns (the paddle is vertical, sticking out of the water like a revolving door -- not like a paddlewheel boat):

If the paddle does turn, it means this field has curl at that point. If it doesn't turn, then there's no curl.

What does it really mean if the paddle turns? Well, it means the water is pushing harder on one side than the other, making it twist. The larger the difference, the more forceful the twist and the bigger the curl. Also, a turning paddle wheel indicates that the field is "uneven" and not symmetric; if the field were even, then it would push on all sides equally and the paddle wouldn't turn at all.

The fact that there is a "twist" means the field is not conservative (this has nothing to do with its political views).

A conservative field is "fair" in the sense that work needed to move from point A to point B, along any path, is the same. For example, consider a river: its field is conservative. Sure, you can get a free ride downstream, but then you have to do work to get back to your starting point. Or, you can do work to move upstream, and get a free ride back. Either way, the amount of work you "put in" is the same as what you get back.

However, in a field with curl (like a whirlpool), you can get a free ride by moving in the direction of the twist. In a whirlpool, you can get a free trip by moving with the current in a circle. If you fight the current and go the wrong way, you have to use energy with no free ride at all.

Conservative fields have zero curl: there are no free twists to push you along. Alternatively, if a field has curl, it is not conservative.

Gravity is another example of a conservative field. Technically, if you lift a rock and then let it fall, the energy you get from falling is the same as what you put in to lift the rock. Theoretically speaking, no energy was gained or lost in this transaction.

Additional Details

To be technical, curl is a vector, which means it has a both a magnitude and a direction. The magnitude is simply the amount of twisting force at a point.

The direction is a little more tricky: it's the orientation of the axis of your paddlewheel in order to get maximum rotation. In other words, it is the direction which will give you the most "free work" from the field. Imagine putting your paddlewheel sideways in the whirlpool - it wouldn't turn at all. If you put it in the proper direction, it begins turning.

But wait a minute -- aren't there two directions to get a twisting motion? Couldn't you just turn the paddlewheel "upside down" and get the maximum curl as well?

Yep, you're right. By convention alone, if the paddle wheel is rotating counterclockwise, its curl vector points out of the page. This is a type of right-hand rule: make a fist with your right hand and stick out your thumb. If the circulation/pushing force follows the twisting of your fingers (counterclockwise), then the curl vector will be in the direction of your thumb.

Mathematics

Circulation is the integral of a vector field along a path - you are adding how much the field "pushes" you along a path.

How do we find this? Well, we should expect some type of dot product, because we want to know the amount that one vector (the force) is pushing in the direction of another (the path). So, the two vectors we need are (1) the path vector and (2) the field vector at every point along the path.

If we have a function that defines the position at any time, $F(t)$, we can take the time derivative to get the velocity at that position.

The velocity vector is always in the direction of movement -- if you are moving from A to B, the velocity vector will be an arrow from A to B, i.e. your change in position or your direction of movement. So, we can use the velocity to get our direction.

It's important to understand why we aren't using the position vector itself -- it tells us where we are, but not where we're going. We need to know our direction to see how much "push" we are getting: Knowing your position in a river isn't important -- are you going upstream or downstream, and at what angle?

The force vector (2) is defined by the field we are in. No derivatives or other changes are necessary -- every point in the field has some force acting on it.

So, our formula for circulation is:

$\displaystyle{\text{ Force at position r } = F(r)}$

$\displaystyle{\text{ Direction at position r } = dr }$

$\displaystyle{\text{ Total pushing force = Circulation } = \int F(r) \cdot dr }$

Remember, velocity is simply the derivative of position (r), so (dr) is a vector giving us our direction. We integrate along the entire path and use the dot product to see how much pushing force is applied. We then sum up these "pushes" to get the total circulation.

Since curl is the circulation per unit area, we can take the circulation for a small area (letting the area shrink to 0). However, since curl is a vector, we need to give it a direction -- the direction is normal (perpendicular) to the surface with the vector field. The magnitude is the same as before: circulation/area.

Recall that by convention (a bunch of people agreeing), counterclockwise circulation will give a curl pointing out of the page. Using these facts, we can create the formula for curl:

$\displaystyle{ \text{Curl} = \frac{\text{circulation}}{\text{area}} = \frac{\int F(r) \cdot dr}{\int S} }$

Where (S) is the surface we are considering; the direction of the curl is the normal to the surface.

You'll see fancier equations for curl where the surface shrinks to zero (such as in wikipedia), but recognize the basic intuition -- curl is the circulation per unit area.

Parting Thoughts

You'll often see curl of a field (F) written like this:

$\displaystyle{ \text{Curl}(F) = \nabla \times F }$

which is a cross-product of the gradient and the field (F). This has to do with how curl is actually computed, which will be material for another article (and probably in your textbook already -- see wikipedia for details).

If I have been successful, you should understand intuitively what circulation and curl mean, and how we got the formulae above. They spring up naturally from our definition of circulation as "pushing force along a path" and curl as "pushing force/area".

Math should be a tool for clearly stating what we already know. Understand the intuition and then tackle the complicated formulas. Happy math.

PS. Have some fun and check out this video of a famous whirlpool. Imagine the circulation on this (go on, imagine):

Vector Calculus

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Vector Calculus: Understanding the Gradient

The gradient is a fancy word for derivative, or the rate of change of a function. It’s a vector (a direction to move) that

Points in the direction of greatest increase of a function (intuition on why)
Is zero at a local maximum or local minimum (because there is no single direction of increase)

The term "gradient" is typically used for functions with several inputs and a single output (a scalar field). Yes, you can say a line has a gradient (its slope), but using "gradient" for single-variable functions is unnecessarily confusing. Keep it simple.

“Gradient” can refer to gradual changes of color, but we’ll stick to the math definition if that’s ok with you. You’ll see the meanings are related.

Properties of the Gradient

Now that we know the gradient is the derivative of a multi-variable function, let’s derive some properties.

The regular, plain-old derivative gives us the rate of change of a single variable, usually $x$. For example, $\frac{dF}{dx}$ tells us how much the function $F$ changes for a change in $x$. But if a function takes multiple variables, such as $x$ and $y$, it will have multiple derivatives: the value of the function will change when we “wiggle” $x$ ($\frac{dF}{dx}$) and when we wiggle $y$ ($\frac{dF}{dy}$).

We can represent these multiple rates of change in a vector, with one component for each derivative. Thus, a function that takes 3 variables will have a gradient with 3 components:

$F(x)$ has one variable and a single derivative: $\frac{dF}{dx}$
$F(x,y,z)$ has three variables and three derivatives: $\frac{dF}{dx}, \frac{dF}{dy}, \frac{dF}{dz}$

The gradient of a multi-variable function has a component for each direction.

And just like the regular derivative, the gradient points in the direction of greatest increase (here's why: we trade motion in each direction enough to maximize the payoff).

However, now that we have multiple directions to consider ($x$, $y$ and $z$), the direction of greatest increase is no longer simply “forward” or “backward” along the $x$-axis, like it is with functions of a single variable.

If we have two variables, then our 2-component gradient can specify any direction on a plane. Likewise, with 3 variables, the gradient can specify and direction in 3D space to move to increase our function.

A Twisted Example

I’m a big fan of examples to help solidify an explanation. Suppose we have a magical oven, with coordinates written on it and a special display screen:

We can type any 3 coordinates (like “3,5,2″) and the display shows us the gradient of the temperature at that point.

The microwave also comes with a convenient clock. Unfortunately, the clock comes at a price — the temperature inside the microwave varies drastically from location to location. But this was well worth it: we really wanted that clock.

With me so far? We type in any coordinate, and the microwave spits out the gradient at that location.

Be careful not to confuse the coordinates and the gradient. The coordinates are the current location, measured on the $x,y,z$ axes. The gradient is a direction to move from our current location, such as move up, down, left or right.

Now suppose we are in need of psychiatric help and put the Pillsbury Dough Boy inside the oven because we think he would taste good. He’s made of cookie dough, right? We place him in a random location inside the oven, and our goal is to cook him as fast as possible. The gradient can help!

The gradient at any location points in the direction of greatest increase of a function. In this case, our function measures temperature. So, the gradient tells us which direction to move the doughboy to get him to a location with a higher temperature, to cook him even faster. Remember that the gradient does not give us the coordinates of where to go; it gives us the direction to move to increase our temperature.

Thus, we would start at a random point like (3,5,2) and check the gradient. In this case, the gradient there is (3,4,5). Now, we wouldn’t actually move an entire 3 units to the right, 4 units back, and 5 units up. The gradient is just a direction, so we’d follow this trajectory for a tiny bit, and then check the gradient again.

We get to a new point, pretty close to our original, which has its own gradient. This new gradient is the new best direction to follow. We’d keep repeating this process: move a bit in the gradient direction, check the gradient, and move a bit in the new gradient direction. Every time we nudged along and follow the gradient, we’d get to a warmer and warmer location.

Eventually, we’d get to the hottest part of the oven and that’s where we’d stay, about to enjoy our fresh cookies.

Don’t eat that cookie!

But before you eat those cookies, let’s make some observations about the gradient. That’s more fun, right?

First, when we reach the hottest point in the oven, what is the gradient there?

Zero. Nada. Zilch. Why? Well, once you are at the maximum location, there is no direction of greatest increase. Any direction you follow will lead to a decrease in temperature. It’s like being at the top of a mountain: any direction you move is downhill. A zero gradient tells you to stay put – you are at the max of the function, and can’t do better.

But what if there are two nearby maximums, like two mountains next to each other? You could be at the top of one mountain, but have a bigger peak next to you. In order to get to the highest point, you have to go downhill first.

Ah, now we are venturing into the not-so-pretty underbelly of the gradient. Finding the maximum in regular (single variable) functions means we find all the places where the derivative is zero: there is no direction of greatest increase. If you recall, the regular derivative will point to local minimums and maximums, and the absolute max/min must be tested from these candidate locations.

The same principle applies to the gradient, a generalization of the derivative. You must find multiple locations where the gradient is zero — you’ll have to test these points to see which one is the global maximum. Again, the top of each hill has a zero gradient — you need to compare the height at each to see which one is higher. Now that we have cleared that up, go enjoy your cookie.

Mathematics

We know the definition of the gradient: a derivative for each variable of a function. The gradient symbol is usually an upside-down delta, and called “del” (this makes a bit of sense – delta indicates change in one variable, and the gradient is the change in for all variables). Taking our group of 3 derivatives above

$\displaystyle{\text{gradient of } F(x,y,z) = \nabla F(x,y,z) = (\frac{dF}{dx},\frac{dF}{dy},\frac{dF}{dz})}$

Notice how the x-component of the gradient is the partial derivative with respect to $x$ (similar for $y$ and $z$). For a one variable function, there is no $y$-component at all, so the gradient reduces to the derivative.

Also, notice how the gradient is a function: it takes 3 coordinates as a position, and returns 3 coordinates as a direction.

$\displaystyle{F(x,y,z) = x + y^2 + z^3 }$

$\displaystyle{\nabla F(x,y,z) = (\frac{dF}{dx},\frac{dF}{dy},\frac{dF}{dz}) = (1, 2y, 3z^2)}$

If we want to find the direction to move to increase our function the fastest, we plug in our current coordinates (such as 3,4,5) into the gradient and get:

$\displaystyle{\text{direction} = (1, 2(4), 3(5)^2) = (1, 8, 75)}$

So, this new vector (1, 8, 75) would be the direction we’d move in to increase the value of our function. In this case, our x-component doesn’t add much to the value of the function: the partial derivative is always 1.

Obvious applications of the gradient are finding the max/min of multivariable functions. Another less obvious but related application is finding the maximum of a constrained function: a function whose x and y values have to lie in a certain domain, i.e. find the maximum of all points constrained to lie along a circle. Solving this calls for my boy Lagrange, but all in due time, all in due time: enjoy the gradient for now.

The key insight is to recognize the gradient as the generalization of the derivative. The gradient points to the direction of greatest increase; keep following the gradient, and you will reach the local maximum.

Questions

Why is the gradient perpendicular to lines of equal potential?

Lines of equal potential (“equipotential”) are the points with the same energy (or value for $F(x,y,z)$). In the simplest case, a circle represents all items the same distance from the center.

The gradient represents the direction of greatest change. If it had any component along the line of equipotential, then that energy would be wasted (as it’s moving closer to a point at the same energy). When the gradient is perpendicular to the equipotential points, it is moving as far from them as possible (this article explains why the gradient is the direction of greatest increase — it’s the direction that maximizes the varying tradeoffs inside a circle).

Vector Calculus

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Vector Calculus: Understanding Divergence

Physical Intuition

Divergence (div) is “flux density”—the amount of flux entering or leaving a point. Think of it as the rate of flux expansion (positive divergence) or flux contraction (negative divergence). If you measure flux in bananas (and c’mon, who doesn’t?), a positive divergence means your location is a source of bananas. You’ve hit the Donkey Kong jackpot.

Remember that by convention, flux is positive when it leaves a closed surface. Imagine you were your normal self, and could talk to points inside a vector field, asking what they saw:

If the point saw flux entering, he’d scream that everything was closing in on him. This is a negative divergence, and the point is capturing flux, like water going down a sink.
If the point saw flux leaving, he’d sniff his armpits and say all flux was existing. This is a positive divergence, and the point is a source of flux, like a hose.

So, divergence is just the net flux per unit volume, or “flux density”, just like regular density is mass per unit volume (of course, we don’t know about “negative” density). Imagine a tiny cube—flux can be coming in on some sides, leaving on others, and we combine all effects to figure out if the total flux is entering or leaving.

The bigger the flux density (positive or negative), the stronger the flux source or sink. A div of zero means there’s no net flux change in side the region. In plain english:

$\displaystyle{\text{ Divergence } = \frac{\text{Flux}}{\text{Volume}}}$

Math Intuition

Now that we have an intuitive explanation, how do we turn that sucker into an equation? The usual calculus way: take a tiny unit of volume and measure the flux going through it. We need to add up the total flux passing through the x, y and z dimensions.

Imagine a cube at the point we want to measure, with sides of length dx, dy and dz. To get the net flux, we see how much the X component of flux changes in the X direction, add that to the Y component’s change in the Y direction, and the Z component’s change in the Z direction. If there are no changes, then we’ll get 0 + 0 + 0, which means no net flux.

If there is some change in the field, we get something like 1 -2 +5 (flux increases in X and Z direction, decreases in Y) which gives us the divergence at that point.

In pseudo-math:

Total flux change = (field change in X direction) + (field change in Y direction) + (field change in Z direction)

Or in more formal math:

$\displaystyle{\text{Divergence} = \lim_{\text{Vol} \to 0}\frac{\text{Flux}}{\text{Vol}}}$

$\displaystyle{\text{Divergence} = \frac{\partial F_x}{\partial x} +\frac{\partial F_y}{\partial y} +\frac{\partial F_z}{\partial z}}$

(Assuming $F_x$ is the field in the x-direction.)

A few remarks:

The symbol for divergence is the upside down triangle for gradient (called del) with a dot [$\triangledown \cdot$]. The gradient gives us the partial derivatives $(\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z})$, and the dot product with our vector $(F_x, F_y, F_z)$ gives the divergence formula above.
Divergence is a single number, like density.
Divergence and flux are closely related – if a volume encloses a positive divergence (a source of flux), it will have positive flux.
"Diverge" means to move away from, which may help you remember that divergence is the rate of flux expansion (positive div) or contraction (negative div).

Divergence isn’t too bad once you get an intuitive understanding of flux. It’s really useful in understanding in theorems like Gauss’ Law.

Vector Calculus

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.

Vector Calculus: Understanding Flux

Once you understand flux intuitively, you don’t need to memorize equations. The formulas become “obvious” dare I say. However, it took a lot of effort to truly understand that:

Flux is the amount of “something” (electric field, bananas, whatever you want) passing through a surface.
The total flux depends on strength of the field, the size of the surface it passes through, and their orientation.

Your vector calculus math life will be so much better once you understand flux. And who doesn’t want that?

Physical Intuition

Think of flux as the amount of something crossing a surface. This “something” can be water, wind, electric field, bananas, pretty much anything you can imagine. Math books will use abstract concepts like electric fields, which is pretty hard to visualize. I find bananas more memorable, so we’ll be using those.

To measure the flux (i.e. bananas) passing through a surface, we need to know

The surface you are considering (shape, size and orientation)
The source of the flux (strength of the field, and which way it is spitting out ~~bananas~~ flux)

The strength of the field is important – would you rather have a handful of \$5 or \$20 bills “flux” into your bank account? Would you rather have a big or little banana come your way? No need to answer that one.

Background Ideas

Keep a few ideas in mind when considering flux:

Vector Field: This is the source of the flux: the thing shooting out bananas, or exerting some force (like gravity or electromagnetism). Flux doesn’t have to be a physical object — you can measure the “pulling force” exerted by a field.
Surface: This is the boundary the flux is crossing through or acting on. The boundary could be a sphere, a plane, even the top of a bucket. Notice that the boundary may not exist — the top of a bucket traces out a circle, but the hole isn’t actually there. We’re considering the flux passing through the region the circle defines.
Timing: We measure flux at a single point in time. Freeze time and ask “Right now, at this moment, how much stuff is passing through my surface?”. If your field doesn’t change over time, then all is well. If your field does change, then you need to pick a point in time to measure the flux.
Measurement: Flux is a total, and is not “per unit area” or “per unit volume”. Flux is the total force you feel, the total number of bananas you see flying by your surface. Think of flux like weight. (There is a separate idea of "flux density" (flux/volume) called divergence, but that’s a separate article.)

Flux Factors

The source of flux has a huge impact on the total flux. Doubling the source (doubling the “banana-ness” of each banana), will double the flux passing through a surface.

Total flux also depends on the orientation of the field and the surface. When our surface completely faces the field it captures maximum flux, like a sail facing directly into the wind. As the surface tilts away from the field, the flux decreases as less and less flux crosses the surface.

Eventually, we get zero flux when the source and boundary are parallel — the flux is passing over the boundary, but not crossing through it. It would be like holding a bucket sideways under a waterfall. You wouldn’t capture much water (ignoring splashing) and may get a few funny looks.

Total flux also depends on the size of our surface. In the same field, a bigger bucket will capture more flux than a smaller one. When we figure out our total flux, we need to see how much field is passing through our entire surface.

This is simple stuff so far, right? If you forget, just think about capturing water from a waterfall. What matters? The strength of the waterfall, the size of the bucket and the orientation of the bucket.

Positive and Negative Flux

One last detail – we need to decide on a positive and negative direction for flux. This decision is arbitrary, but by convention (aka your math teacher will penalize you if you don’t agree), positive flux leaves a closed surface, and negative flux enters a closed surface.

Think of flux as a hose spraying water. Positive flux means flux is leaving the hose; the hose is a source of flux. Negative flux is like water entering a sink; it is a sink of flux. So positive flux = leaving, negative = entering. Got it? (By the way, the terms “source” and “sink” are sometimes used to describe fields).

Quick Summary

Quick checkpoint: Flux depends on

The size of the surface
Magnitude of the source field
The angle between them

A fire hose shooting at a tiny bucket (small surface, large magnitude) could have the same flux as a garden hose aimed at a large bucket (large surface, small magnitude). And in case you forgot, flux reminds us to hold the bucket so it is facing the source. This should be obvious – but don’t you want ideas (especially in math!) to be obvious?

Math Intuition

Now that we have a physical intuition, let’s try to derive the math. In most cases, the source of flux will be described as a vector field: Given a point (x,y,z), there's a formula giving the flux vector at that point.

We want to know how much of that vector field is acting/passing through our surface, taking the magnitude, orientation, and size into account. From our intuition, it should look something like this:

Total flux = Field Strength * Surface Size * Surface Orientation

However, this formula only works if the vector field is the same at every point. Usually, it’s not, so we’ll take the standard calculus approach to solving problems:

Divide the surface into pieces
Find the flux at each piece
Add up the small units of flux to get total flux (integrate).

Let’s go out on a limb and call the tiny piece of the surface dS. Total flux is:

Total flux = (Field Strength * dS * Orientation) for every dS.

Total flux = Integral (Field Strength * Orientation * dS)

Make sense so far? Now, we need to figure out how much orientation actually matters. Like we said before, if the field and the surface are parallel, then there is zero flux. If they are perpendicular, there is full flux.

(In this diagram, the flux is parallel with the top surface, and nothing enters from that direction. Mathematically, we represent surfaces by their normal vector, which sticks out of the surface. Don’t let this bookkeeping detail disrupt your visualization.)

If there is an angle, then it is some factor in-between:

How much, exactly? Well, this is a job for the dot product, which is the projection of the field onto the surface. The dot product gives us a number (from 0 to 1) that tells us what percent of the field is passing through the surface. So, the equation becomes:

Total flux = Integral( Vector Field Strength dot dS )

And finally, we convert to the stuffy equation you’ll see in your textbook, where F is our field, S is a unit of area and n is the normal vector of the surface:

$\displaystyle{\text{flux} = \int_{S} \vec{F} \cdot \vec{n} \ dS}$

Time for one last detail — how do we find the normal vector for our surface?

Good question. For a surface like a plane, the normal vector is the same in every direction. For a sphere, the normal vector is in the same direction as $\vec{r}$, your position on the sphere: the top of a sphere has a normal vector that goes out the top; the bottom has one going out the bottom, etc.

More complicated shapes may have a normal vector that varies quite a bit. In this case, try to break the shape into smaller regions (like spheres, cylinders and planes) and find the flux in each part. Then, add up the flux in each region to get the total flux (keeping in mind positive and negative flux).

If the shape is more complicated than that, you may need a computer model or more advanced theorems; but at least you know what is happening behind the scenes.

Flux Examples

Let’s do a few thought experiments to understand flux. Imagine a tube, that lets water pass right through it. We hold the tube under a waterfall, wait a few seconds, then ask what the flux is. I want a numeric answer – what is the flux?

You might think we need to know the speed of the waterfall, the size of the tube, the orientation, etc. But that isn’t the case.

Remember our convention for flux orientation: positive means flux is leaving, negative means flux is entering. In this example, water is falling downward, or entering the tube. This means the top surface has negative flux (it appears to be siphoning up water).

However, what’s happening at the bottom of the box? The water passed through the top and is now leaving the bottom, which is positive flux:

Ah, this beautiful diagram shows what is going on. The top of the box / tube says that water is entering, and the bottom says water is leaving. Assuming the same amount of water is leaving and entering (the rate of water falling is a constant), the net flux would be zero. Think of it as X + (-X) = 0.

What if we had increased the rate of water? Decreased? What would happen?

My (possibly incorrect) answer: If we increased the rate, it means more water would enter than leaves, for a brief moment. We’d have a momentary spike in negative flux (the tube would look like a sink), until the rates equalized. Vice versa if we decreased the rate of water – we’d have a brief spike of positive flux (more water was leaving than entering), until the rate equalized.

Even though net flux is zero, this is different from having zero flux pass through each surface. If you are in an empty field, no shape will generate any flux. But if you are in a field where flux is canceling, changing your shape or orientation could create a non-zero flux. Recognize the difference between having zero flux because the field is zero, vs. having all the flux cancel.

One more point – the “tube” we are considering is a region we define, not a physical tube. Measuring flux is about drawing imaginary boundaries, not having a physical shape. So, when we define the region of a “bucket”, it would not “fill up” with flux. Flux is what is passing through the sides of a bucket at a moment in time. Clearly, if we put in a physical bucket it would fill up, but that’s not what we’re measuring. We’re seeing how much flux would be entering a region we define, from any and all sides (not just the opening). Got it?

And one more point. We haven’t really talked about the units of flux. What is it measured in? As far as I understand, the units can be anything – it depends on the unit of your vector field. So, your vector field might represent bananas, in which case you get total bananas crossing a surface. Or, your field could represent bananas-per-second, in which case you’d get the bananas-per-second crossing your surface. The units of flux depend on the units of your vector field.

Flux is relatively simple to understand, and is really helpful in vector calculus and physics. Trying to understand flux by looking at a mess of integrals is not the way to go. First get an intuitive understanding, and the details will make more sense.

Insights

Here’s a few insights that hit me after learning about flux:

You can take the time derivative of flux. If the vector field (F) changes with time (t), you can use dF/dt to see how the total flux changes over time. Even though flux is taken at a unit in time, you can measure flux at two consecutive moments to see how fast it is changing.
You can integrate flux, which means finding how much flux has crossed over a certain time. If the field F is constant over time, you can multiply the flux at one instant by your duration. But if F changes with time, then you need to measure at each moment and integrate. Each flux calculation is done at an instant of time, then they are summed together. Again, this is the standard calculus technique.

In our waterfall example, we looked at a single point in time where water had been flowing for a while. If we chose an early point in time, we would have negative flux: water had entered the top, but not yet left the bottom. If we turned off the water, there’d be an instant in time with positive flux: water had stopped entering, but was continuing to leave.

Flux is important for math, electricity and magnetism, and your science life will be better for knowing it. Your social life – not so much.

This was a long article. Take a break. Take a shower. Get outside. See your family. Or, read on about divergence. It’s your call.

Vector Calculus

Join 450k Monthly Readers

Enjoy the article? There's plenty more to help you build a lasting, intuitive understanding of math. Join the newsletter for bonus content and the latest updates.