Last time we tackled derivatives with a “machine” metaphor. Functions are a machine with an input (x) and output (y) lever. The derivative, dy/dx, is how much “output wiggle” we get when we wiggle the input:

Now, we can make a bigger machine from smaller ones (h = f + g, h = f * g, etc.). The derivative rules (addition rule, product rule) give us the “overall wiggle” in terms of the parts. The chain rule is special: we can “zoom into” a single derivative and rewrite it in terms of another input (like converting “miles per hour” to “miles per minute” — we’re converting the “time” input).

And with that recap, let’s build our intuition for the advanced derivative rules. Onward!

## Division (Quotient Rule)

Ah, the quotient rule — the one nobody remembers. Oh, maybe you memorized it with a song like “Low dee high, high dee low…”, but that’s not understanding!

It’s time to visualize the division rule (who says “quotient” in real life?). The key is to see division as a type of multiplication:

We have a rectangle, we have area, but the sides are “f” and “1/g”. Input x changes off on the side (by dx), so f and g change (by df and dg)… but how does 1/g behave?

Chain rule to the rescue! We can wrap up 1/g into a nice, clean variable and then “zoom in” to see that yes, it has a division inside.

So let’s pretend 1/g is a separate function, m. Inside function m is a division, but ignore that for a minute. We just want to combine two perspectives:

- f changes by df, contributing area df * m = df * (1 / g)
- m changes by dm, contributing area dm * f = ?

We turned m into 1/g easily. Fine. But what is dm (how much 1/g changed) in terms of dg (how much g changed)?

We want the difference between neighboring values of 1/g: 1/g and 1(g + dg). For example:

- What’s the difference between 1/4 and 1/3? 1/12
- How about 1/5 and 1/4? 1/20
- How about 1/6 and 1/5? 1/30

How does this work? We get the common denominator: for 1/3 and 1/4, it’s 1/12. And the difference between “neighbors” (like 1/3 and 1/4) will be 1 / common denominator, aka 1 / (x * (x + 1)). See if you can work out why!

If we make our derivative model perfect, and assume there’s no difference between neighbors, the +1 goes away and we get:

(This is useful as a general fact: The change from 1/100 to 1/101 = one ten thousandth)

The difference is negative, because the new value (1/4) is smaller than the original (1/3). So what’s the actual change?

- g changes by dg, so 1/g becomes 1/(g + dg)
- The instant rate of change is -1/g^2 [as we saw earlier]
- The total change = dg * rate, or dg * (-1/g^2)

A few gut checks:

Why is the derivative negative? As dg increases, the denominator gets larger, the total value gets smaller, so we’re actually shrinking (1/3 to 1/4 is a shrink of 1/12).

Why do we have -1/g^2 * dg and not just -1/g^2? (This confused me at first). Remember, -1/g^2 is the

*chain rule conversion factor*between the “g” and “1/g” scales (like saying 1 hour = 60 minutes). Fine. You still need to multiply by how far you went on the “g” scale, aka dg! An hour may be 60 minutes, but how many do you want to convert?Where does dm fit in? m is another name for 1/g. dm represents the total change in 1/g, which as we saw, was -1/g^2 * dg. This substitution trick is used all over calculus to help split up gnarly calculations. “Oh, it looks like we’re doing a straight multiplication. Whoops, we zoomed in and saw one variable is actually a division — change perspective to the inner variable, and multiply by the conversion factor”.

Phew. To convert our “dg” wiggle into a “dm” wiggle we do:

And get:

Yay! Now, your overeager textbook may simplify this to:

and it burns! It burns! This “simplification” hides how the division rule is just a variation of the product rule. Remember, there’s still two slivers of area to combine:

- The “f” (numerator) sliver grows as expected
- The “g” (denominator) sliver is
*negative*(as g increases, the area gets smaller)

Using your intuition, you know it’s the denominator that’s contributing the negative change.

## Exponents (e^x)

e is my favorite number. It has the property

which means, in English, “e changes by 100% of its current amount” (read more).

The “current amount” assumes x is the exponent, and we want changes from x’s point of view (df/dx). What if u(x)=x^2 is the exponent, but we still want changes from x’s point of view?

It’s the chain rule again — we want to zoom into u, get to x, and see how a wiggle of dx changes the whole system:

- x changes by dx
- u changes by du/dx, or d(x^2)/dx = 2x
- How does e^u change?

Now remember, e^u doesn’t know we want changes from x’s point of view. e only knows its derivative is 100% of the current amount, which is the exponent u:

The overall change, on a per-x basis is:

This confused me at first. I originally thought the derivative would require us to bring down “u”. No — the derivative of e^foo is e^foo. No more.

But if foo is controlled by anything else, then we need to multiply the rate of change by the conversion factor (d(foo)/dx) when we jump into that inner point of view.

## Natural Logarithm

The derivative is ln(x) is 1/x. It’s usually given as a matter-of-fact.

My intuition is to see ln(x) as the time needed to grow to x:

- ln(10) is the time to grow from 1 to 10, assuming 100% continuous growth

Ok, fine. How long does it take to grow to the “next” value, like 11? (x + dx, where dx = 1)

When we’re at x=10, we’re growing exponentially at 10 units per second. It takes roughly 1/10 of a second (1/x) to get to the next value. And when we’re at x=11, it takes 1/11 of a second to get to 12. And so on: the time to the next value is 1/x.

The derivative

is mainly a fact to memorize, but it makes sense with a “time to grow” intepreration.

## A Hairy Example: x^x

Time to test our intuition: what’s the derivative of x^x?

This is a bad mamma jamma. There’s two approaches:

**Approach 1: Rewrite everything in terms of e.**

Oh e, you’re so marvelous:

Any exponent (a^b) is really just e in different clothing: [e^ln(a)]^b. We’re just asking for the derivative of e^foo, where foo = ln(x) * x.

But wait! Since we want the derivative in terms of “x”, not foo, we need to jump into x’s point of view and multiply by d(foo)/dx:

The derivative of “ln(x) * x” is just a quick application of the product rule. If h=x^x, the final result is:

We wrote e^[ln(x)*x] in its original notation, x^x. Yay! The intuition was “rewrite in terms of e and follow the chain rule”.

**Approach 2: Independent Points Of View**

Remember, deriviatives assume each part of the system works independently. Rather than seeing x^x as a giant glob, assume it’s made from two interacting functions: u^v. We can then add their individual contributions. We’re sneaky though, u and v are the same (u = v = x), but don’t let them know!

From u’s point of view, v is just a static power (i.e., if v=3, then it’s u^3) so we have:

And from v’s point of view, u is just some static base (if u=5, we have 5^v). We rewrite into base e, and we get

We add each point of view for the total change:

And the reveal: u = v = x! There’s no conversion factor for this new viewpoint (du/dx = dv/dx = dx/dx = 1), and we have:

It’s the same as before! I was pretty excited to approach x^x from a few different angles.

By the way, use Wolfram Alpha (like so) to check your work on derivatives (click “show steps”).

**Question: If u were more complex, where would we use du/dx?**

Imagine u was a more complex function like u=x^2 + 3: where would we multiply by du/dx?

Let’s think about it: du/dx only comes into play from u’s point of view (when v is changing, u is a static value, and it doesn’t matter that u can be further broken down in terms of x). u’s contribution is

if we wanted the “dx” point of view, we’d include du/dx here:

We’re multiplying by the “du/dx” conversion factor to get things from x’s point of view. Similarly, if v were more complex, we’d have a dv/dx term when computing v’s point of view.

Look what happened — we figured out the genric d/du and converted it into a more specific d/dx when needed.

## It’s Easier With Infinitesimals

Separating dy from dx in dy/dx is “against the rules” of limits, but works great with infinitesimals. You can figure out the derivative rules really quickly:

**Product rule:**

We set “df * dg” to zero when jumping out of the infinitesimal world and back to our regular number system.

Think in terms of “How much did g change? How much did f change?” and derivatives snap into place much easier. “Divide through” by dx at the end.

## Summary: See the Machine

Our goal is to understand calculus intuition, not memorization. I need a few analogies to get me thinking:

- Functions are machines, derivatives are the “wiggle” behavior
- Derivative rules find the “overall wiggle” in terms of the wiggles of each part
- The chain rule zooms into a perspective (hours => minutes)
- The product rule adds area
- The quotient rule adds area (but one area contribution is negative)
- e changes by 100% of the current amount (d/dx e^x = 100% * e^x)
- natural log is the time for e^x to reach the next value (x units/sec means 1/x to the next value)

With practice, ideas start clicking. Don’t worry about getting tripped up — I still tried to overuse the chain-rule when working with exponents. Learning is a process!

Happy math.

## Appendix: Partial Derivatives

Let’s say our function depends on two inputs:

The derivative of f can be seen from x’s point of view (how does f change with x?) or y’s point of view (how does f change with y?). It’s the same idea: we have two “independent” perspectives that we combine for the overall behavior (it’s like combining the point of view of two Solipsists, who think they’re the only “real” people in the universe).

If x and y depend on the same variable (like t, time), we can write the following:

It’s a bit of the chain rule — we’re combining two perspectives, and for each perspective, we dive into its root cause (time).

If x and y are otherwise independent, we represent the derivative along each axis in a vector:

This is the gradient, a way to represent “From this point, if you travel in the x or y direction, here’s how you’ll change”. We combined our 1-dimensional “points of view” to get an understanding of the entire 2d system. Whoa.

## Other Posts In This Series

- A Gentle Introduction To Learning Calculus
- How To Understand Derivatives: The Product, Power & Chain Rules
- How To Understand Derivatives: The Quotient Rule, Exponents, and Logarithms
- An Intuitive Introduction To Limits
- Why Do We Need Limits and Infinitesimals?
- Learning Calculus: Overcoming Our Artificial Need for Precision
- Prehistoric Calculus: Discovering Pi
- A Calculus Analogy: Integrals as Multiplication
- Calculus: Building Intuition for the Derivative
- Understanding Calculus With A Bank Account Metaphor
- A Friendly Chat About Whether 0.999... = 1

Kalid

u are genius!what will be ur next topic ? I eagerly wait for ur math posts.

@Gulrez: Hah, thanks for the kind words :). I’m hoping to do more on a bunch of math topics, appreciate the support!

As usual, a nice article.

By the way, in section Division (Quotient Rule), there is an extra ‘]’ at the end of the line

. f changes by df, contributing area df * m = df * (1 / g)]

Thanks for the great article.

H.

@Hitoshi: Thanks for the comment! Just fixed up the article :).

When will Math curriculums begin combining concepts in meaningful ways like this? Calculus classes like to split ‘Power Rule,’ ‘Quotient Rule,’ and ‘Chain Rule’ into discrete sections, when really they’re consequences of the same basic idea. Perhaps it’s less labor-intensive teaching distinct formulas to be memorized, but it’s just another reason people hear ‘Calculus’ and immediately glaze over.

And while I’m lamenting–your mention of infinitesimals brings up another sore spot of mine. A Calc TA told me how separating ‘dy/dx’ is ‘against the rules,’ as you say, and I took it to heart. Imagine poor, confused me a couple semesters later in DiffEq: “I thought this was against the rules!” The limit-based approach to teaching Calculus needs some serious revision, particularly for non-mathematicians moving into practical fields.

@Joe: I hear you — we slice and dice concepts and miss the cohesive whole. All the calculus rules are just examples of how different subparts can contribute to the whole, but I’m only seeing that now, 10+ years after high school. Ugh.

And yeah — there’s so much “don’t do this, I don’t know why, but don’t!” in math. Why is it against the rules? What are the “rules”? Limits are a seatbelt introduced to address theoretical concerns many, many years after Calculus was put into use. Learning about seatbelts is fine, but don’t dive into them before you explain what a car [i.e., calculus] is!

Very informative and great analogy

@Kinar: Thank you!

you are a geneous

Thanks so much for the explanation. Really help a lot! 1 Q still confused regarding quotient rule (dg part):

If we used directly g+dg instead of x+1 in the calculation:

1/(g+dg) – 1/g

= (g-g+dg) / (g^2+g*dg)

= -dg / (g^2 + g*dg)

May I ask why is the g*dg ignored or cancelled? Is there an intuitive reasoning for it? Thanks so much in advance.

hoping to get more example to solve

What helped me understand derivatives is this: If y=e^x, then y’=e^x, which is of course related to your favorite number, e, which does seem to have more significance than pi. A graph and an explanation could help others.

Thanks Tim, a follow-up charting the path of e^x would be a good idea.

If I had known this existed while I was taking calculus, there would have been so fewer headaches. I had always known on a subconscious level that there were connections between calculus and the earlier maths–my teacher even confirmed that by joking that all other classes were “pre-calculus”–but for the life of me, I could never find those connections. And they were right there mocking me the whole time! This helped more than any lecture, peer teaching, or textbook ever could. Thanks!

i like how u sort out but the side of Q u nid to expand a little bit so that we gain better and best………….thump up khalid and i ll kip following ol

Thank you for the time you’ve put into these articles they’ve helped me a lot and I’m glad to know there are people who care about intuition and share it, but I’m confused about your intuition of the natural log. Why is the derivative always predicting the next increment by one? Why not .5? Shouldn’t it be infitestimally small because it is using the input of the a naturally growing function?

Thanks Jackson, great question. Intuitively, think about taking a “single step forward”, which is 1*dx. Another way of seeing it: when taking the derivative, we split our continuous function into discrete steps (a single dx wide at each step) and see our rate of change when we increment by the next dx.

An analogy: we represent a photo with individual pixels (dx) and step through one pixel at a time. The pixels are chosen at an “infinitely small retina resolution” where we don’t notice them at the macro scale. (There’s more on limits later in this series.)

Sorry for my inconvienience but I’m confused how you got 1/x. Wouldn’t the derivative be dx/x because dx would be the change and x would be the current value as dx approach 0.I’m just confused why 1=dx instead of approaching 0.

No worries, great question, I realize it can be unclear. I start with scenarios where “dx = 1” (which is a GIANT step) to estimate results in my head. Then, I can set dx = 0 (taking the limit) to get an exact prediction.

Let’s say I want the derivative of x^2. I imagine going from 10^2 to 11^2 (we jumped from x=10 to x=11, so dx=1). The difference is 21, or 2x + dx (20 + 1). I can then set dx = 0 and get the exact answer of 2x. (If there was no gap between x and the next value, the derivative would be 2x.)

The natural log is harder to compute: it’s the time e^x needs to grow from 1 to x. How does it change?

Imagine going from 10 to 11 (again, dx=1). Here, we’re at 10 and we grow exponentially up to 11. Since e^x assumes we’re growing at 100% of our current value, it takes 1/10 of a unit time to get to 11. (10 + (1/10)*10 = 11).

Now, this isn’t *quite* accurate because as we’re going to 11, we’re getting faster. I.e., when we’re at 10.5 we’re growing at 10.5 units per unit time, not the 10 we expected. Removing the imaginary dx fixes this (we assume there is no midpoint between x=10 and the next value, so it really is a perfect 1/x amount of time we wait).