- Old little lady
- Red big dog
- Vietnamese spicy food

Do you have a logical reason why they sound strange? Or are they just “off”?

You probably didn’t think, “In 3rd grade I mastered the Royal Order of Adjectives:

- Determiner
- Observation
- Size
- Shape
- Age
- Color
- Origin
- Material
- Qualifier

… and upon applying them, noticed several errors.... Read article

]]>- Old little lady
- Red big dog
- Vietnamese spicy food

Do you have a logical reason why they sound strange? Or are they just “off”?

You probably didn’t think, “In 3rd grade I mastered the Royal Order of Adjectives:

- Determiner
- Observation
- Size
- Shape
- Age
- Color
- Origin
- Material
- Qualifier

… and upon applying them, noticed several errors. *Old little lady* is incorrect because rules #3 and #5 are swapped — a childish mistake, really. The next…”

Ugh. Describing Gran Gran isn’t a logic puzzle. But guess what students learning English are taught?

Even as a native speaker, could you construct this chart? Is this how you’d teach someone English?

**The Adjective Fallacy is trying to learn by mastering the formal rules.** Just because a concept *can* be rigorously defined doesn’t mean we should study it that way.

We didn’t become good at English by studying a chart: we developed an ear for the language and know how it *should* sound. And “old little lady” sounds off.

Similarly, getting good at math doesn’t mean marching through a gauntlet of rules on every problem. It’s having a native speaker’s feeling about what works or doesn’t.

“303 x 13 = 5074” looks strange, but not because we computed the left-hand side. It’s weird because odd numbers can’t multiply to become even (intuition). The last digit of the result should be 3×3= 9. 5074 is too large, since 300 x 10 (similar numbers nearby) is only 3000. Our Spidey Sense is blaring that the computation looks wrong.

My learning goal is knowing enough to make rough predictions on my own. I want a horse sense for algebra, calculus, trig, and even imaginary exponents, *without* scurrying off to apply an equation.

Rules aren’t inherently bad: they summarize, resolve ambiguous cases, and help us practice our weak spots. The question is how much to use them when starting off.

Learn enough rules to get started – don’t attempt to master them from the outset. See examples in a larger context and let the pattern-matching machinery of your brain get to work.

Math is a language too. Here’s a gut check: **Would my current math study technique have helped me learn English?**

If an English class spent a month on the adjective chart we’d have a talk with the teacher. But a Calculus class that spends weeks on the formal theory of limits is typical. Can we admit that studying this much detail, this early, doesn’t build fluency?

Pondering that question made me realize I had large gaps in trigonometry and calculus. I could only describe concepts using the adjective chart I’d memorized with a furrowed brow. (*I’ll describe my grandma, just give me a minute!*)

Enough was enough: embrace approaches that *actually* help you, like seeing the big picture first. In Calculus, that might mean seeing an integral in the first lesson:

That’s what Calculus *does*: break a shape into pieces (the derivative), and glue it together in various ways (the integral).

A typical calculus syllabus covers integrals in week 12, after months of “building a foundation”. Better not use a complete sentence until we’ve studied adjectives, nouns and verbs separately, right? (My hand wringing could solve the energy crisis.)

The path to understanding isn’t always the most structured.

Happy math.

Update: After research, this concept is expressed in the notion of tacit knowledge, or “we know more than we can tell” (Michael Polanyi). Tacit knowledge is acquired through experience, and complements the explicit knowledge we get by studying rules.

]]>Imagine a chef who follows a new recipe to the letter. No matter how it looks, no matter the reviews the recipe has, if the dish *doesn’t taste good* we know something is wrong.... Read article

Imagine a chef who follows a new recipe to the letter. No matter how it looks, no matter the reviews the recipe has, if the dish *doesn’t taste good* we know something is wrong. A sense of taste is the ultimate cooking tool.

When learning, we defer to external indicators (tests, teachers) to inform us we’ve learned something. External standards are made to be objective and easily-verified (Did you pick the correct answer?), but the important, subjective question is how well a concept sits in your mind. Did you actually experience it?

My checklist of truly learning a topic means it is:

**Understandable:**Did I have an aha! moment? Can I explain the concept in simple language? Does it connect to other topics I know?**Memorable**: Do I have an analogy, diagram, or example that will stick with me for months or years?**Enjoyable**: Do I want to revisit or use this knowledge? Don’t study literature in a way that makes you hate reading.

That’s my current definition of “intuitive understanding”, and for subjects I care about, I keep digging until I have all three aspects.

It’s ok to take your time (calculus took years to become enjoyable) and it’s ok to not care about everything equally (biology isn’t particularly compelling for me). I firmly believe any subject can become intuitive if I put in the effort to find analogies, diagrams, examples, plain-english descriptions, and technical details (the ADEPT method).

So, how do you set your own learning standard?

Let’s not recreate the wheel: famous learners have already described their thinking process, which we can adopt. It’s not about memorizing Einstein’s Theory of Relativity, it’s about internalizing the mindset that could lead to that idea.

Here’s a few viewpoints that resonated for me:

“Education is what remains after one has forgotten what one has learned in school.” —Albert Einstein

“The only real valuable thing is intuition.” —Albert Einstein

- True learning goes beyond memorized facts. While I can forget the equation of a circle, I can’t forget that it’s round. And knowing it’s perfectly round quickly leads me back to the equation.

“The noblest pleasure is the joy of understanding.” —Da Vinci

- True understanding implies joy. And practically, you’ll only continue studying what you like.

“To teach effectively a teacher must develop a feeling for his subject; he cannot make his students sense its vitality if he does not sense it himself. He cannot share his enthusiasm when he has no enthusiasm to share. How he makes his point may be as important as the point he makes; he must personally feel it to be important.” —George Póyla

“Education is the kindling of a flame, not the filling of a vessel.” —Socrates

- We aren’t robots, and we should embrace the subjective aspects of learning. A teacher’s goal goes beyond knowledge-transfer to enjoyment-transfer.

The Humane Representation of Thought from Bret Victor

- There are deeper, richer levels of understanding than what’s traditionally used. Explore a higher standard.

“I think most people can learn a lot more than they think they can. They sell themselves short without trying. One bit of advice: it is important to view knowledge as sort of a semantic tree — make sure you understand the fundamental principles, ie the trunk and big branches, before you get into the leaves/details or there is nothing for them to hang on to.” —Elon Musk

- Your own standards greatly influence your understanding. External tests won’t check if facts are comfortably connected.

I have a larger collection of quotes that help align my thinking.

After rummaging through quotes that resonate, build a set of questions that capture your standard. For me, it became:

- Do I have a visceral, ingrained analogy? Can it help solve problems?
- Can I explain the concept to others? Do they want to explain it to their friends afterwards?
- Will I remember the essential idea after a few months or years?
- Can I find something to enjoy in the topic? Will I return after I inevitably forget 95% of it?

Questions seem to prompt more interest than a statement: “Do I have an analogy?” vs. “I must have an analogy”.

With this approach, strange corners of math I didn’t previously enjoy (like Euler’s Formula) became mysteries to solve: what *is* the insight here? Can I express it in a plain-English sentence? (Here’s a shot: Continuous rotation means you’re moving in a circle.)

Setting new standards helps take control of your education and overcome longstanding demons.

When people say “I hate math” I doubt they actually hate numbers (arithmetic), patterns & relationships (algebra), or shapes (geometry). They hate lessons that don’t contain insight, enjoyment, and basic human empathy. It’s fine to be disinterested in Ancient Egyptian Civilization, but *hate* comes from getting lost on a tour and spending the night near a sarcophagus.

These are the questions that helped me: what are your standards for learning?

(*Thanks to Scott Young, Uri Bram, and Tom Miller for brainstorming ideas.*)

This completed grid is the *outer product*, which can be separated into the:

**Dot product**, the interactions between similar dimensions (`x*x`

`y*y`

,`z*z`

)**Cross product**, the interactions between different dimensions (`x*y`

,`y*z`

,`z*x`

, etc.)

The dot product (vec(a) · vec(b)) measures similarity because it only accumulates interactions in matching dimensions.... Read article

]]>This completed grid is the *outer product*, which can be separated into the:

**Dot product**, the interactions between similar dimensions (`x*x`

`y*y`

,`z*z`

)**Cross product**, the interactions between different dimensions (`x*y`

,`y*z`

,`z*x`

, etc.)

The dot product (vec(a) · vec(b)) measures similarity because it only accumulates interactions in matching dimensions. It’s a simple calculation with 3 components.

The cross product (written vec(a) times vec(b)) has to measure a half-dozen “cross interactions”. The calculation looks complex but the concept is simple: accumulate 6 individual differences for the total.

Instead of thinking “When do I need the cross product?” think “When do I need interactions between different dimensions?”.

Area, for example, is formed by vectors pointing in different directions (the more orthogonal, the better). Indeed, the cross product measures the area spanned by two 3d vectors (source):

(The “cross product” assumes 3d vectors, but the concept extends to higher dimensions.)

Did the key intuition click? Let’s hop into the details.

The dot product represents vector similarity with a single number:

(Remember that trig functions are percentages.) Should the cross product (difference between interacting vectors) be a single number too?

Let’s try. Sine is the percentage difference, so we could use:

Unfortunately, we’re missing a lot of detail. `x`

is 100% different from both `y`

and `z`

, but shouldn’t `x*y`

and `x*z`

be different from each other? As Tolstoy wrote, “All happy families are alike; each unhappy family is unhappy in its own way.”

Instead, let’s express these unique differences as a vector:

The

*size*of the cross product is the numeric “amount of difference” (with sin(theta) as the percentage)The

*direction*of the cross product is based on both inputs: it’s the direction orthogonal to both (i.e., favoring neither)

A vector result represents the `x*y`

and `x*z`

separately, even though `y`

and `z`

are both “100% different” from `x`

.

(Should the dot product be turned into a vector too? Well, we have the inputs and a similarity percentage. There’s no new direction that isn’t available from either input.)

Two vectors determine a plane, and the cross product points in a direction different from both (source):

Here’s the problem: there’s two perpendicular directions. By convention, we assume a “right-handed system” (source):

If you hold your first two fingers like the diagram shows, your thumb will point in the direction of the cross product. I make sure the orientation is correct by sweeping my first finger from vec(a) to vec(b). With the direction figured out, the magnitude of the cross product is |a| |b| sin(theta), which is proportional to the magnitude of each vector and the “difference percentage” (sine).

To remember the right hand rule, write the `xyz`

order twice: `xyzxyz`

. Next, find the pattern you’re looking for:

`xy => z`

(`x`

cross`y`

is`z`

)`yz => x`

(`y`

cross`z`

is`x`

; we looped around:`y`

to`z`

to`x`

)`zx => y`

Now, `xy`

and `yx`

have opposite signs because they are forward and backward in our `xyzxyz`

setup.

So, without a formula, you should be able to calculate:

Again, this is because `x`

cross `y`

is positive `z`

in a right-handed coordinate system. I used unit vectors, but we could scale the terms:

A single vector can be decomposed into its 3 orthogonal parts:

When the vectors are crossed, each pair of orthogonal components (like a_x times b_y) casts a vote for where the orthogonal vector should point. 6 components, 6 votes, and their total is the cross product. (Similar to the gradient, where axis casts a vote for the direction of greatest increase.)

`xy => z`

and`yx => -z`

(assume vec(a) is first, so`xy`

means a_x b_y)`yz => x`

and`zy => -x`

`zx => y`

and`xz => -y`

`xy`

and `yx`

fight it out in the `z`

direction. If those terms are equal, such as in (2, 1, 0) times (2, 1, 1), there is no cross product component in the `z`

direction (2 – 2 = 0).

The final combination is:

where vec(n) is the unit vector normal to vec(a) and vec(b).

Don’t let this scare you:

- There’s 6 terms, 3 positive and 3 negative
- Two dimensions vote on the third (so the
`z`

term must only have`y`

and`x`

components) - The positive/negative order is based on the
`xyzxyz`

pattern

If you like, there is an algebraic proof, that the formula is both orthogonal and of size |a| |b| sin(theta), but I like the “proportional voting” intuition.

Again, we should do simple cross products in our head:

Why? We crossed the `x`

and `y`

axes, giving us `z`

(or vec(i) times vec(j) = vec(k), using those unit vectors). Crossing the other way gives -vec(k).

Here’s how I walk through more complex examples:

- Let’s do the last term, the z-component. That’s (1)(5) minus (4)(2), or 5 – 8 = -3. I did
`z`

first because it uses`x`

and`y`

, the first two terms. Try seeing (1)(5) as “forward” as you scan from the first vector to the second, and (4)(2) as backwards as you move from the second vector to the first. - Now the
`y`

component: (3)(4) – (6)(1) = 12 – 6 = 6 - Now the
`x`

component: (2)(6) – (5)(3) = 12 – 15 = -3

So, the total is (-3, 6, -3) which we can verify with Wolfram Alpha.

In short:

- The cross product tracks all the “cross interactions” between dimensions
- There are 6 interactions (2 in each dimension), with signs based on the
`xyzxyz`

order

**Connection with the Determinant**

You can calculate the cross product using the determinant of this matrix:

There’s a neat connection here, as the determinant (“signed area/volume”) tracks the contributions from orthogonal components.

There are theoretical reasons why the cross product (as an orthogonal vector) is only available in 0, 1, 3 or 7 dimensions. However, the cross product as a single number is essentially the determinant (a signed area, volume, or hypervolume as a scalar).

**Connection with Curl**

Curl measures the twisting force a vector field applies to a point, and is measured with a vector perpendicular to the surface. Whenever you hear “perpendicular vector” start thinking “cross product”.

We take the “determinant” of this matrix:

Instead of multiplication, the interaction is taking a partial derivative. As before, the vec(i) component of curl is based on the vectors and derivatives in the vec(j) and vec(k) directions.

**Relation to the Pythagorean Theorem**

The cross and dot product are like the orthogonal sides of a triangle:

For unit vectors, where |a| = |b| = 1 , we have:

I cheated a bit in the grid diagram, as we have to track the squared magnitudes (as done in the Pythagorean Theorem).

**Advanced Math**

The cross product & friends get extended in Clifford Algebra and Geometric Algebra. I’m still learning these.

**Cross Products of Cross Products**

Sometimes you’ll have a scenario like:

First, the cross product isn’t associative: order matters.

Next, remember what the cross product is doing: finding orthogonal vectors. If any two components are parallel (vec(a) parallel to vec(b)) then there are no dimensions pushing on each other, and the cross product is zero (which carries through to 0 times vec(c)).

But it’s ok for vec(a) and vec(c) to be parallel, since they are never directly involved in a cross product, for example:

Whoa! How’d we get back to vec(j)? We asked for a direction perpendicular to both vec(i) and vec(j), and made that direction perpendicular to vec(i) again. Being “doubly perpendicular” means you’re back on the original axis.

**Dot Product of Cross Products**

Now if we take

what happens? We’re forced to do vec(a) times vec(b) first, because vec(b) · vec(c) returns a scalar (single number) which can’t be used in a cross product.

If vec(a) and vec(c) are parallel, what happens? Well, vec(a) times vec(b) is perpendicular to vec(a), which means it’s perpendicular to vec(c), so the dot product with vec(c) will be zero.

I never really memorized these rules, I have to think through the interactions.

**Other Coordinate Systems**

The Unity game engine is left-handed, OpenGL (and most math/physics tools) are right-handed. Why?

In a computer game, `x`

goes horizontal, `y`

goes vertical, and `z`

goes “into the screen”. This results in a left-handed system. (Try it: using your right hand, you can see `x`

cross `y`

should point out of the screen).

**Applications of the Cross Product**

- Find the direction perpendicular to two given vectors.
- Find the signed area spanned by two vectors.
- Determine if two vectors are orthogonal (checking for a dot product of 0 is likely faster though).
- “Multiply” two vectors when only perpendicular cross-terms make a contribution (such as finding torque).
- With the quaternions (4d complex numbers), the cross product performs the work of rotating one vector around another (another article in the works!).

Happy math.

]]>While true, there’s a deeper principle at work.

**The Law of Interactions: The whole is based on the parts and the interaction between them.**... Read article

While true, there’s a deeper principle at work.

**The Law of Interactions: The whole is based on the parts and the interaction between them.**

The wording “Law of Cosines” gets you thinking about the mechanics of the formula, not what it means. Part of my learning strategy is rewording ideas into ones that make sense.

The Law of Cosines, after cranking through geometric steps we’re prone to forget, looks like c

^{2}= a^{2}+ b^{2}– 2abcos(C).This is suspiciously like the expansion that if c = (a + b), then c

^{2}= a^{2}+ b^{2}+ 2abThe difference is that 2ab has an extra factor, cos(C), which measures the “actual overlap percentage” (2ab assumes we fully overlap, i.e. where cos(C) = 1).

So, the Law of Cosines is really a generalization of how c

^{2}= (a + b)^{2}expands when components aren’t fully lined up. We’re treating geometric lines as terms in an algebraic expansion.

Imagine a restaurant with a single chef, Alice. She’s overworked, so Bob is hired as her assistant (sous chef).

Based on Alice’s current performance, and Bob’s performance in his interview, what happens when they work together?

Surely the new result must be their combined effort:

Hah! Office workers everywhere are rolling their eyes. You can’t just assume people contribute identically when they’re put together: there are interactions to account for.

Beyond their individual contributions, the two might slow each other down (*Where’d you put the whisk again?*), or find ways to work together (*I’m peeling carrots anyway, use some of mine.*).

In a system with several parts, start with the individual contributions and then ask if their interaction will:

- Help each other
- Hurt each other
- Ignore each other

The original idea that “Total = Alice + Bob” is more generally expressed as:

We need to separate the *list* of participants (Alice, Bob) from the result of their interaction.

Take the numbers 5 and 3. We can write them like so:

- Parts = (5, 3)

and we’re pretty sure they combine to make 8. But is there another way to get that conclusion?

Yes: we multiply. Beyond repeated counting, multiplication shows what happens when the parts of a system interact:

We’ve gone from “parts view”, (5, 3), to “interaction view”, (5 + 3)^{2}. The result of interaction mode says the system would result in 64 if it *did* interact with itself.

One caveat: when going to interaction view, we wrote down (5 + 3)(5 + 3), but we can’t simplify (5 + 3) = 8 on the outset. We’re using addition for bookkeeping until multiplication can combine the parts.

Oh, another caveat: why can we just add the interactions, but not the parts? Great question. The individual parts might be pointing in different dimensions, and don’t line up nicely on the same scale. The interacting parts turn into *area*, which can be combined to the same result no matter the orientation.

(I’ll investigate this concept more in a follow-up. It’s a neat idea that area is a generic, easily combinable quantity but individual paths are not.)

Simple setups like (5, 3) are easy to think through, like eyeballing 2x + 3 = 7 and guessing x = 2. But a more complex scenario like x^{2} + 3x = 15 requires a systematic approach.

The Law of Cosines is a systematic approach to working through the parts:

- List the parts
- Get every interaction as area
- Add to find the total contribution
- Convert into the equivalent “single part”

The last step is often implied. Once we’ve merged the jumble of interactions, we want the *single* part that could represent the entire system. Is there a single person (Charlie) whose efforts are identical to that of Alice and Bob working together?

The Law of Cosines gives us a way to find Charlie.

When two parts interact, they can help, hurt, or ignore each other:

- Perfect alignment means they help 100% (5 and 3)
- Perfect mis-alignment means they hurt 100% (5 and -3)
- Partial alignment or mis-alignment means they help or hurt by a percentage
- No alignment means they ignore each other

How do we measure alignment? With cosine.

Using our trig analogy, cosine is the *percentage* an angle moves along the ground.

A 0-degree angle follows the ground perfectly (100%), and moving vertically doesn’t follow it at all (0%). Other angles are a fraction in-between.

If the parts in our system can be written as paths, and we know the angle between them is theta (theta), then we can measure the overlap with cosine. One path acts as the ground, and the other is the path we’re following:

When paths are perfectly aligned, their full strength is used (ab and ba). The interaction factor cos(theta) modifies that strength to show much they *actually* work together.

So, our jumble of interactions becomes:

Phew! And that’s the Law of Cosines: collect every interaction, account for the alignment, and simplify it to a single part. (The formula is usually written without the square root, but usually you want c, not c^{2}.)

Now, why is the Law of Cosines often written with a negative sign? Well, the assumption is that in a typical triangle, a small *internal* angle C means the sides are negatively aligned, while theta (theta) is an *external* look at their alignment:

Similarly, a large internal angle means the sides are positively aligned, and will help each other. Typically, a small angle means you’re moving in the same direction, but this internal/external difference means we reverse the sign.

Personally, I don’t memorize whether there’s a positive or negative sign: I think about whether the parts will help or hurt each other in the scenario, and make the interaction positive or negative. Don’t be a slave to the formula.

Let’s say my triangle has side a = 10 and side b = 20. What is side c when the angle between a and b is:

**45 degrees in alignment**

Here, we need the Law of Cosines. a and b are pointing partially in the same direction. We switch to interaction mode to get to a common, combinable unit (area):

- a
^{2}= 100 - b
^{2}= 400 - 2ab = 2 · 10 · 20 = 400, but we need to adjust by the interaction factor. That is cos(45) = .707, so the real interaction factor is 400 · .707 = 282.8

The overall interactions are:

and the equivalent single side (c) is:

**70 degrees in mis-alignment**

Again, we need the Law of Cosines. We can see that the angles fight each other, so the interaction will be negative:

Our intuition says this arrangement should be *smaller* than the previous one (since the sides aren’t working together), and it is.

**Full alignment or mis-alignment**

When our “triangle” has an angle of 0 degrees (or 180), all the parts are lying flat. Here, the parts are in the same dimension, and can be treated as regular numbers:

- Fully aligned: 10 + 20 = 30
- Fully mis-aligned: 10 – 20 = -10 (pointing in direction of B).

The Law of Cosines still works, of course:

- Full alignment: a
^{2}+ b^{2}+ 2abcos(theta) = 100 + 400 + 400cos(0) = 900 and c = √(900) = 30 - Full mis-alignment: a
^{2}+ b^{2}– 2abcos(theta) = 100 + 400 + 400cos(180) = 100 which means c = √(100) = 10 (pointing backwards).

Again, we shouldn’t robotically follow the formula: have a rough idea what the result should be, and think through the calculations. (“The overall interaction is this, so the individual side would that…”).

Thinking of interactions is one interpretation: next time, we’ll see it as the Law of Projections.

Happy math.

The Law of Cosines resembles the Pythagorean Theorem, no?

Now you might suspect why. The Pythagorean Theorem is the special case of *zero interaction*, which happens when the sides are at right angles. After all, 90 degree angle is vertical, and has 0% overlap with the ground.

The Law of Cosines becomes:

If we know the parts won’t interact, we can ignore interaction effects. However, the *self-interactions* are still there and must be combined: a^{2} and b^{2} are fine, but the crossover terms ab and ba disappear.

Here’s another version of the Pythagorean Theorem. We can’t combine a and b directly, so combine their interactions and reduce them to a single part:

You might be hankering for a geometric proof. Here’s one from quora, based on a paper by Knuth:

The insight is that we take our original a-b-c triangle and scale it by a (giving the a^{2}-ab-ac triangle) and b (giving the ab-b^{2}-bc triangle). These two triangles build a larger, similar triangle ac-bc-c^{2}, and with some trig, the bottom portion can be shown to equal a^{2} + b^{2} – 2abcos(theta).

While interesting, I don’t like these types of proofs up front. The Law of Cosines is about interactions, not re-arranging triangles. Does this explanation get you thinking about what cosine represents? About when it should be positive, negative, or zero?

Concept | Law of Cosines |
---|---|

Analogy | Imagine an assistant chef whose interactions may (or may not) be helpful. |

Diagram | |

Example | Suppose a = 10 and b = 20 in a triangle. If they are aligned 45-degrees, their interaction is a^{2} + b^{2} + 2abcos(45) = 782.8 and the remaining side is √(782.8) = 27.97 units long. |

Plain-English | The Law of Interactions: The whole is based on the parts and the interaction between them. |

Technical | Triangle with internal angle C: c^{2} = a^{2} + b^{2} – 2abcos(C) General interaction: c ^{2} = a^{2} + b^{2} + 2abcos(theta) |