Calculus: Building Intuition for the Derivative

How do you wish the derivative was explained to you? Here's my take.

Psst! The derivative is the heart of calculus, buried inside this definition:

\displaystyle{ f'(x) =\lim_{dx\to 0} \frac{f(x+dx)-f(x)}{dx}}

But what does it mean?

Let's say I gave you a magic newspaper that listed the daily stock market changes for the next few years (+1% Monday, -2% Tuesday...). What could you do?

Well, you'd apply the changes one-by-one, plot out future prices, and buy low / sell high to build your empire. You could even hire away the monkeys who currently throw darts at newspapers.

Others call the derivative "the slope of a function" -- it's so bland! Like having the magic newspaper, the derivative is a crystal ball that lets you see how a pattern will play out. You can plot the past/present/future, find minimums/maximums, and yes, staff your simian workforce to pick stocks.

Step away from the gnarly equation. Equations exist to convey ideas: understand the idea, not the grammar.

Derivatives create a perfect model of change from an imperfect guess.

This result came over thousands of years of thinking, from Archimedes to Newton. Let's look at the analogies behind it.

We all live in a shiny continuum

Infinity is a constant source of paradoxes ("headaches"):

  • A line is made up of points? Sure.
  • So there's an infinite number of points on a line? Yep.
  • How do you cross a room when there's an infinite number of points to visit? (Gee, thanks Zeno).

And yet, we move. My intuition is to fight infinity with infinity. Sure, there's infinity points between 0 and 1. But I move two infinities of points per second (somehow!) and I cross the gap in half a second.

Distance has infinite points, motion is possible, therefore motion is in terms of "infinities of points per second".

Instead of thinking of differences ("How far to the next point?") we can compare rates ("How fast are you moving through this continuum?").

It's strange, but you can see 10/5 as "I need to travel 10 'infinities' in 5 segments of time. To do this, I travel 2 'infinities' for each unit of time".

Analogy: See division as a rate of motion through a continuum of points

What's after zero?

Another brain-buster: What number comes after zero? .01? .0001?

Hrm. Anything you can name, I can name smaller (I'll just halve your number... nyah!).

Even though we can't calculate the number after zero, it must be there, right? Like demons of yore, it's the "number that cannot be written, lest ye be smitten".

Call the gap to the next number "dx". I don't know exactly how big it is, but it's there!

Analogy: dx is a "jump" to the next number in the continuum.

Measurements depend on the instrument

The derivative predicts change. Ok, how do we measure speed (change in distance)?

Officer: Do you know how fast you were going?

Driver: I have no idea.

Officer: 95 miles per hour.

Driver: But I haven't been driving for an hour!

We clearly don't need a "full hour" to measure your speed. We can take a before-and-after measurement (over 1 second, let's say) and get your instantaneous speed. If you moved 140 feet in one second, you're going ~95mph. Simple, right?

Not exactly. Imagine a video camera pointed at Clark Kent (Superman's alter-ego). The camera records 24 pictures/sec (40ms per photo) and Clark seems still. On a second-by-second basis, he's not moving, and his speed is 0mph.

Wrong again! Between each photo, within that 40ms, Clark changes to Superman, solves crimes, and returns to his chair for a nice photo. We measured 0mph but he's really moving -- he goes too fast for our instruments!

Analogy: Like a camera watching Superman, the speed we measure depends on the instrument!

Running the Treadmill

We're nearing the chewy, slightly tangy center of the derivative. We need before-and-after measurements to detect change, but our measurements could be flawed.

Imagine a shirtless Santa on a treadmill (go on, I'll wait). We're going to measure his heart rate in a stress test: we attach dozens of heavy, cold electrodes and get him jogging.

Santa huffs, he puffs, and his heart rate shoots to 190 beats per minute. That must be his "under stress" heart rate, correct?

Nope. See, the very presence of stern scientists and cold electrodes increased his heart rate! We measured 190bpm, but who knows what we'd see if the electrodes weren't there! Of course, if the electrodes weren't there, we wouldn't have a measurement.

What to do? Well, look at the system:

  • measurement = actual amount + measurement effect

Ah. After lots of studies, we may find "Oh, each electrode adds 10bpm to the heartrate". We make the measurement (imperfect guess of 190) and remove the effect of electrodes ("perfect estimate").

Analogy: Remove the "electrode effect" after making your measurement

By the way, the "electrode effect" shows up everywhere. Research studies have the Hawthorne Effect where people change their behavior because they are being studied. Gee, it seems everyone we scrutinize sticks to their diet!

Understanding the derivative

Armed with these insights, we can see how the derivative models change:

Derivative explanation

Start with some system to study, f(x):

  1. Change by the smallest amount possible (dx)
  2. Get the before-and-after difference: f(x + dx) - f(x)
  3. We don't know exactly how small "dx" is, and we don't care: get the rate of motion through the continuum: [f(x + dx) - f(x)] / dx
  4. This rate, however small, has some error (our cameras are too slow!). Predict what happens if the measurement were perfect, if dx wasn't there.

The magic's in the final step: how do we remove the electrodes? We have two approaches:

  • Limits: what happens when dx shrinks to nothingness, beyond any error margin?
  • Infinitesimals: What if dx is a tiny number, undetectable in our number system?

Both are ways to formalize the notion of "How do we throw away dx when it's not needed?".

My pet peeve: Limits are a modern formalism, they didn't exist in Newton's time. They help make dx disappear "cleanly". But teaching them before the derivative is like showing a steering wheel without a car! It's a tool to help the derivative work, not something to be studied in a vacuum.

An Example: f(x) = x^2

Let's shake loose the cobwebs with an example. How does the function f(x) = x^2 change as we move through the continuum?

Derivative explanation

Note the difference in the last 2 equations:

  • One has the error built in (dx)
  • The other has the "true" change, where dx = 0 (we assume our measurements have no effect on the outcome)

Time for real numbers. Here's the values for f(x) = x^2, with intervals of dx = 1:

  • 1, 4, 9, 16, 25, 36, 49, 64...

The absolute change between each result is:

  • 1, 3, 5, 7, 9, 11, 13, 15...

(Here, the absolute change is the "speed" between each step, where the interval is 1)

Consider the jump from x=2 to x=3 (3^2 - 2^2 = 5). What is "5" made of?

  • Measured rate = Actual Rate + Error
  • 5 = 2x + dx
  • 5 = 2(2) + 1

Sure, we measured a "5 units moved per second" because we went from 4 to 9 in one interval. But our instruments trick us! 4 units of speed came from the real change, and 1 unit was due to shoddy instruments (1.0 is a large jump, no?).

If we restrict ourselves to integers, 5 is the perfect speed measurement from 4 to 9. There's no "error" in assuming dx = 1 because that's the true interval between neighboring points.

But in the real world, measurements every 1.0 seconds is too slow. What if our dx was 0.1? What speed would we measure at x=2?

Well, we examine the change from x=2 to x=2.1:

  • 2.1^2 - 2^2 = 0.41

Remember, 0.41 is what we changed in an interval of 0.1. Our speed-per-unit is 0.41 / .1 = 4.1. And again we have:

  • Measured rate = Actual Rate + Error
  • 4.1 = 2x + dx

Interesting. With dx=0.1, the measured and actual rates are close (4.1 to 4, 2.5% error). When dx=1, the rates are pretty different (5 to 4, 25% error).

Following the pattern, we see that throwing out the electrodes (letting dx=0) reveals the true rate of 2x.

In plain English: We analyzed how f(x) = x^2 changes, found an "imperfect" measurement of 2x + dx, and deduced a "perfect" model of change as 2x.

The derivative as "continuous division"

I see the integral as better multiplication, where you can apply a changing quantity to another.

The derivative is "better division", where you get the speed through the continuum at every instant. Something like 10/5 = 2 says "you have a constant speed of 2 through the continuum".

When your speed changes as you go, you need to describe your speed at each instant. That's the derivative.

If you apply this changing speed to each instant (take the integral of the derivative), you recreate the original behavior, just like applying the daily stock market changes to recreate the full price history. But this is a big topic for another day.

Gotcha: The Many meanings of "Derivative"

You'll see "derivative" in many contexts:

  • "The derivative of x^2 is 2x" means "At every point, we are changing by a speed of 2x (twice the current x-position)". (General formula for change)

  • "The derivative is 44" means "At our current location, our rate of change is 44." When f(x) = x^2, at x=22 we're changing at 44 (Specific rate of change).

  • "The derivative is dx" may refer to the tiny, hypothetical jump to the next position. Technically, dx is the "differential" but the terms get mixed up. Sometimes people will say "derivative of x" and mean dx.

Gotcha: Our models may not be perfect

We found the "perfect" model by making a measurement and improving it. Sometimes, this isn't good enough -- we're predicting what would happen if dx wasn't there, but added dx to get our initial guess!

Some ill-behaved functions defy the prediction: there's a difference between removing dx with the limit and what actually happens at that instant. These are called "discontinuous" functions, which is essentially "cannot be modeled with limits". As you can guess, the derivative doesn't work on them because we can't actually predict their behavior.

Discontinuous functions are rare in practice, and often exist as "Gotcha!" test questions ("Oh, you tried to take the derivative of a discontinuous function, you fail"). Realize the theoretical limitation of derivatives, and then realize their practical use in measuring every natural phenomena. Nearly every function you'll see (sine, cosine, e, polynomials, etc.) is continuous.

Gotcha: Integration doesn't really exist

The relationship between derivatives, integrals and anti-derivatives is nuanced (and I got it wrong originally). Here's a metaphor. Start with a plate, your function to examine:

  • Differentiation is breaking the plate into shards. There is a specific procedure: take a difference, find a rate of change, then assume dx isn't there.
  • Integration is weighing the shards: your original function was "this" big. There's a procedure, cumulative addition, but it doesn't tell you what the plate looked like.
  • Anti-differentiation is figuring out the original shape of the plate from the pile of shards.

There's no algorithm to find the anti-derivative; we have to guess. We make a lookup table with a bunch of known derivatives (original plate => pile of shards) and look at our existing pile to see if it's similar. "Let's find the integral of 10x. Well, it looks like 2x is the derivative of x^2. So... scribble scribble... 10x is the derivative of 5x^2.".

Finding derivatives is mechanics; finding anti-derivatives is an art. Sometimes we get stuck: we take the changes, apply them piece by piece, and mechanically reconstruct a pattern. It might not be the "real" original plate, but is good enough to work with.

Another subtlety: aren't the integral and anti-derivative the same? (That's what I originally thought)

Yes, but this isn't obvious: it's the fundamental theorem of calculus! (It's like saying "Aren't a^2 + b^2 and c^2 the same? Yes, but this isn't obvious: it's the Pythagorean theorem!"). Thanks to Joshua Zucker for helping sort me out.

Reading math

Math is a language, and I want to "read" calculus (not "recite" calculus, i.e. like we can recite medieval German hymns). I need the message behind the definitions.

My biggest aha! was realizing the transient role of dx: it makes a measurement, and is removed to make a perfect model. Limits/infinitesimals are a formalism, we can't get caught up in them. Newton seemed to do ok without them.

Armed with these analogies, other math questions become interesting:

  • How do we measure different sizes of infinity? (In some sense they're all "infinite", in other senses the range (0,1) is smaller than (0,2))
  • What are the real rules about making "dx go away"? (How do infinitesimals and limits really work?)
  • How do we describe numbers without writing them down? "The next number after 0" is the beginnings of analysis (which I want to learn).

The fundamentals are interesting when you see why they exist. Happy math.

Other Posts In This Series

  1. A Gentle Introduction To Learning Calculus
  2. How To Understand Derivatives: The Product, Power & Chain Rules
  3. How To Understand Derivatives: The Quotient Rule, Exponents, and Logarithms
  4. An Intuitive Introduction To Limits
  5. Why Do We Need Limits and Infinitesimals?
  6. Learning Calculus: Overcoming Our Artificial Need for Precision
  7. Prehistoric Calculus: Discovering Pi
  8. A Calculus Analogy: Integrals as Multiplication
  9. Calculus: Building Intuition for the Derivative
  10. Understanding Calculus With A Bank Account Metaphor
  11. A Friendly Chat About Whether 0.999... = 1

Questions & Contributions


  1. I just wanted to let you know that I really appreciate the effort you put into this. I only discovered this website a few days ago, and I’ve been having a blast reading all those intuitive approaches!!

    You should consider writing an elementary and highschool book of mathematics, as well as teaching on khansacademy 😛

    Please keep this flowing :) and if there’s any way we, the audience, can support you, please do mention how!

  2. @AK: Thanks for the comment — really appreciate the support! I’m actually looking at ways to help tap into the community — one idea is getting a little section after each post to share the analogies that worked (or questions that are still outstanding). I’d love certain articles (like the one on e, for example) to become a living reference about “What actually made it click”. Wikipedia is great for strict definitions, Khan and others for detailed tutorials / practice problems, and I’d like to contribute aha! moments (i.e. the last step that turned the light bulb on). Definitely something I’m looking to develop, I’ll be posting on this soon =).

  3. @Pat: Thanks, glad you liked it! Oh man, how I wish I could go back in time and give myself some tutorials :).

    @Zaine: Thanks, I really appreciate it!

  4. Joshua Zucker emailed me after the comment form ate his reply, pasting below:

    Apparently my long comment on your recent post got eaten somewhere
    along the line. Darn.

    Anyway, my point was that you really misrepresent integrals. They’re
    easier than derivatives, not harder. It’s antiderivatives that are
    tough, and although the fundamental theorem says they’re the same as
    integrals, the whole point of the theorem is that there’s something
    meaningful to say there! Well, actually, antiderivatives aren’t
    really tough, it’s just that we’re picky about wanting to write them
    in terms of certain kinds of functions, which is your “break lots of
    plates” analogy. We know exactly how the pieces were made, so we can
    just glue them back together. The hard part is recognizing the brand
    name of the plate when we’re done, not reassembling the plate.

    You also seem inconsistent about saying in your intro that you can use
    the rate of change to reconstruct the future prices, and then later
    saying that putting the pieces back together is hard. Integrals are,
    as you say “better multiplication” — you just have to multiply and

    There is lots and lots of good stuff in the post too, of course! I
    particularly love the idea of the derivative as an inference of what
    the perfect tool would measure, from approximations using imperfect
    tools. I don’t think I ever thought of it as a tool in quite that
    sense, and it’s a useful thing. I mean, I have thought of the
    derivative at a point as a local property, and the derivative as an
    operator that maps functions to functions, but this feels more like a
    caliper that is open to some finite amount and then you’re reducing
    that amount to see what’s going on; it captures more of the limit
    process in there.

    Oh, one more note: Oddly, I’m totally comfortable with the idea of dx
    = the next number right after 0, or the jump between “adjacent” real
    numbers, but I am really bothered by the analogy of dividing 5
    infinities by 2 infinities of points to get 5/2.


    Hi Joshua,

    Great feedback — I think the nuances of integrals vs. anti-derivatives were previously lost on me :). After a little reading ( I think I’m up to speed:

    * Integration is literally the process of gluing the pieces together (mechanical, finding the sum of many products)
    * Anti-derivatives are the function whose derivative is f (i.e., the “brand” as you say)

    The essence of the FTOC (which I’ve previously missed) is that Integrals are *computable* from anti-derivatives, which is pretty amazing. Literally gluing pieces isn’t hard, but saying “this reconstructed plate is an Ikea Furjen” is the tricky part (realizing what function, easily defined, would create such an integral).

    “…the idea of the derivative as an inference of what the perfect tool would measure, from approximations using imperfect tools” — I love this concise description, that’s exactly it. Yes, in this context it’s like a little caliper which is prodding, only to disappear again to help figure out a greater result. The operator and local property / slope interpretations are other ones to switch between. When writing this article, I was ruminating on the purpose of limits, which always bothered me because they were ignored so often in engineering classes (even though the derivative wasn’t!). In this case, limits were mathematical scaffolding.

    The 5 vs 2 infinities doesn’t quite sit right with me either — it’s my gut screaming for there to be “some” way to move through an infinitude of points. My analysis knowledge is very limited, but perhaps something like a Lebesgue measure could capture this notion (that 0-5 is a larger infinite range than 0-2)? (

    Really appreciate the discussion, I love refining these thoughts! I’ll update the article soon, as I get my intuitions in order.


    Josh: I think a better analogy is this:

    Integration is piling all the shards on a scale and reading the total.
    Antidifferentiation is putting the shards carefully back together in
    exactly the right order and recognizing the plate.

  5. It is realy interesting. I have enjoy it ..nd lear a lot. Today I unmderstood What is Derivative ? Actually I am searching this but give us. Than you so much. Please give the this opportunity to learn math.

  6. Khalid,

    Long-time lurker, first-time poster. Firstly, just wanted to say congrats on all your work here, really impressive. This is my favourite maths site on the web; I see the seeds of an educational revolution here. Reminds me of the time I got a weighty book ”Applying Maths in the Chemical and Biological Sciences”..I was hoping for an interesting novel, what I got was almost pure grammar, i.e. I was looking for semantics but all I got was syntax. Your articles explain the meaning, i.e. utility, of these abstract notions. Your complex numbers article helped solved the riddle of how ”imaginary numbers” could be use in the real world, so thanks!

    Like the (modified!) analogy for the distinction of integral and anti-derivative, which was yet another one of those esoteric relationships that was never explored in high school; are you going to amend the original article?



  7. @Bassman, Ogbuka: I’ll take those as suggestions for future topics, thanks.

    @Asmaul: Glad it was helpeful!

    @John: Thanks for the note, really appreciate it! I hear you, so many math explanations just focus on the grammar, like the lifeless language classes that nobody ever seems to learn from (contrasted with learning a language by actually being immersed in it and speaking it, vs. trying to crunch through the rules like a computer).

    I’m going to update the article right now with the new integral/anti-derivative analogy. Thanks again for posting!

  8. This came at a pretty good time for me since its publication coincided with my own autodidactic journey through math! I was fresh into calc/derrivatives when this came and I skimmed through, initially getting about half of it. Then while walking my dogs today I got deep into thinking about really understanding derrivatives after a few plug and chug sessions, and I begun recalling what you had written (especially regarding the “actual rate+error” part) and the superman analogy.

    In retrospect it was a good thing I was walking in the barren woods because the unconcious “OOOOOOOOOOHHH!” of my aha moment was so loud. My dogs didn’t seem to care though, they were busy pooping and such.

    Thank you, thank you, thank you!

  9. @Anonymous: Awesome, I’m glad the aha! came :). I’m planning on making some changes to the site to help share and discuss the individual aha! moments, really appreciate the note!

  10. Hey Kalid, another great article!
    But I noticed something; couldn’t you just, instead of even doing all the other math, just take the exponent of the original number, multiply the number in front of it and then minus one from the exponent? if you didn’t get that, here’s what I mean: the derivative of x^2=2*1(x)^(2-1), which equates to 2x. It also works in the reverse of finding the original number using the derivative: 10x^1;10x^(1+1)=10x^2; (10/2)x^2=5x^2. Should I have put this here, or on your new aha moents and FAQ thingy?
    Thank you,
    Just a kid.

  11. @just a kid: Thanks for the comment! For posting, either method is fine! The aha!/FAQ thingy is a way to have longer discussions, since regular wordpress comments don’t have threading (and the discussions could get hard to follow).

    Your shortcut definitely works (take the exponent, decrease by one). It’s neat to see why this works: if we’re taking the derivative of x^n (x raised to some power), we make a model like this:

    [(x + dx)^n – x^n ] / dx

    = [(x^n + Something * x^(n-1) * dx + Something2 * x^(n-2) * dx^2 + …) – x^n ] / dx

    = Something * x^(n-1) + Something2 * x^(n-2) * dx

    Most of the other terms go away because we want dx to be zero (i.e., assume a perfect model). We’re left with

    Something * x^(n-1)

    And what is the “Something? Well, it’s the number “n” (this is due to the Binomial Theorem), more details here:

    But yep, you got it — there’s a shortcut to figure out how the derivative of a regular polynomial (x^n) will behave :).

  12. This is a fair explanation of the theory behind derivatives; but I like how Wilberger explains and motivates tangent curves (which are directly and simply related to derivatives). Not only does he NOT use the idea of “dx” (which doesn’t actually exist in any system of numbers beyond the integers, since there is no unique number that is closest to zero), but he winds up defining the theory so that it works on arbitrary algebraic curves (not only functions).

    Check it out — look at his (njwilberger’s) Math Foundations series on YouTube. Most people reading here will be able to skip to something like the episode on doing calculus on the unit circle, but don’t expect to understand EVERYTHING if you do that. The interesting thing is that he defines this without using limits at all; the essential point is that he uses “the nth degree polynomial that best approximates the surface at that point” (of course, this is the Taylor expansion at that point).


  13. Really great stuff. Mathematics is the foundation of all science and science is the compass to help us navigate the universe. Keep up the good work. Very much appreciated.

  14. Hi Khalid,
    Great article. I have always been fascinated by calculus and always wanted to decipher the true meaning of derivative. Your article gives me a great insight. However I would beg you to clarify the following confusion that has arisen.
    We all know that derivative of Y = X^2 is 2x. when you calculate values of y for x=2 and 3, you get y = 4 and 9 respectively. The change in y here is 9-4 = 5. However if I substitute x= 2 in the derivative function dy/dx it gives me 2x = 4. you showed us why this difference exists. It is because of the dx factor (Shoddy instrument). But the reality is that y changed by 5 units when x changed from 2 to 3. Are you saying that dy/dx or derivative is not here to calculate rate of change for such large changes and if you use it for large changes results are inaccurate. Does that mean that dy/dx can only be used to calculate very small changes.
    Earlier I thought if you want to find how a function f(x) is changing w.r.t x between 2 values without substituting the values, just calculate the derivative and substitute x but it seems I was wrong?

    Also I didn’t understand when you say
    The derivative is 44″ means “At our current location, our rate of change is 44.”
    Change is a relative term. How can there be a change at a current location. It has always got to be between two locations.

  15. Heya, I just hopped over to your web-site through StumbleUpon. Not somthing I would typically browse, but I enjoyed your thoughts none the less. Thank you for making some thing worth reading through.

  16. Sudar, I understand your confusion.

    Your last paragraph is the most important. The differential \displaystyle{(dy/dx)} MIGHT be understood as the rate of change at a single point, but it also might be confusing if you think of it like that. It’s important that you see that the differential is not the same thing as the difference \displaystyle{(y2-y1)/(x2-x1)=\delta y/\delta x}. The difference requires two different point; the differential takes only one point.

    Another way to think of the differential is that it’s the slope of the line that best approximates the curve at that point. This definition gives you some surprising algebraic power — and it also suggests some other operations, such as the “linear subderivative”, which is the _line_ that best approximates the curve, and from that the “quadratic subderivative” (and so on). These are very cleanly defined operations on algebraic curves, and require only algebra, no analysis or limits.


  17. The derivative is a concept that relates a continuous property( average change ) to a discrete one (instantaneous change). Even if one had a perfect instrument to measure instantaneous change, one wouldn’t be able to – because of our conception (and consequent definition) of speed.

    To properly understand a derivative you would need the concept of a limit. Limits are to calculus what de Broglie’s wavelength is to quantum physics (it bridges the gap between wave and particle properties – between discrete and continuous)

    Also limit is not a way to make the derivative work. It is just one application of limit.
    In physics and signal theory certain functions are so complicated that you have to use limits to define them – we call them generalized functions.

    The derivative can be taught without limits ( since the derivative deals with rate of change ) but if you are introducing infitesimals then i think you could have introduced the limit too.

  18. Nikhil, you do not need limits or infinitesimals to properly understand the derivative. The derivative is sufficiently understood as the slope of the line tangent to a curve at a point. This geometric understanding does not invoke limits or infinitesimals. You can add in limits to this definition to handle piecewise continuous curves, but as-is this definition can handle arbitrary curves, rather than being limited to functions.

    If one is learning general calculus then infinitesimals are essential; but if one is learning the derivative they are not, and therefore no limits are needed. Iverson actually wrote a Calculus text without using limits, and he only used infinitesimals informally. It’s available online at Aside from that oddity, the text is notable for its computational focus and for its treatment of some advanced theoretical topics such as fractional integrals (Wikipedia calls this the “differintegral”).


  19. ” Nikhil, you do not need limits or infinitesimals to properly understand the derivative. “

    I never said that you need limits to understand the derivative. Read the last paragraph of my comment.

    In your explanation, you talk about infinity and the continuum. What I was saying is – that the conceptual leap from there to that of a limit is very small. So there is no need to avoid the concept of a limit.

    One doesn’t need the epsilon – delta definition to introduce the concept of a limit.

  20. Tanksley , you say that the concept of a limit restricts the definition of a derivative to functions and makes it inapplicable to arbitrary curves. I did not follow this. Could you elaborate ?

  21. Nikhail, you said “To properly understand a derivative you would need the concept of a limit.” That’s the sentence I was seeking to correct. Your last paragraph claims that you don’t need limits but then implies that you need infinitesimals, and this is also something I disputed — but assuming your post is not self-contradictory, your claim would imply that you need infinitesimals in order to understand derivatives _improperly_, and if you add limits you can understand them _properly_, and there’s no other way to even begin to understand derivatives.

    I contradicted this claim by saying that there is another way of understanding the derivative: the geometric definition. It requires no limits, no infinitesimals, no continuum. It works not only on smooth functions, but also on arbitrary smooth curves. (I’ll explain in my next comment.)

    You said that I mentioned infinity and the continuum. I didn’t mention either; the only place I can find those concepts is in the original post. I would also disagree entirely with the original post’s take on them; for example, there is not only one infinitesimal, rather, there are an unknown number of them, so you cannot iterate through the continuum by adding just any infinitesimal to a number (if you do this, you’ll miss points on the continuum).

    On the other hand, I do agree that one does not need epsilon-delta to introduce limits; one can introduce limits for other purposes. Or one can introduce limits for their own sake. But this has nothing to do with the topic of understanding the derivative.


  22. “You said that I mentioned infinity and the continuum. I didn’t mention either; the only place I can find those concepts is in the original post.”

    Yes I was referring to the original post. I just stumbled upon this article while doing a google search. and assumed that you were its author. Now I have explored the site, and discovered it was Kalid.

    I was thinking about your comment – “The derivative is sufficiently understood as the slope of the line tangent to a curve at a point.”. I have some doubts regarding this definition. But I will first wait for your comment on the application of the derivative to arbitrary curves and how the limit restricts this applicability.

  23. (Note: I hope the LaTeX below works. I wish there were a preview mode…)

    I claimed that using limits and infinitesimals to define the derivative led to restricting ourselves to functions, while using the geometric definition of the derivative allowed arbitrary curves rather than only functions. (There are other advantages; for example, using the geometric definition allows you to reason about derivatives of curves over arbitrary fields rather than only the continuum.)

    Recall that the geometric definition of the derivative is the slope of the line tangent to the curve at any point on the curve. First let me distinguish a “function” from a “curve”. Every function is a curve, but a function has at most one value per input, while a curve can have any number of values. We can consider the subset of general curves called the “algebraic curves”, consisting of the Cartesian graphs of the polynomials of the appropriate number of variables for the dimension we’re examining; analytic curves are also amenable to this analysis, or curves on other coordinate systems.

    And a simple example of that is the classical unit circle. In order to find the derivative of the unit circle using limits, one has to split the circle into upper and lower halves. If one uses the geometric definition, however, there is only one curve, and computing a formula for its tangent line is simple algebra. The result is a formula for the tangent line to the circle at every point on the plane (sometimes called the “first order semiderivative”), and it’s easy to see how to extract the slope of that line.

    The algebra one performs in order to extract this is to evaluate the curve at \displaystyle{(x+r,y+s)}, where r and s are variables representing arbitrary numbers, then express the result in terms of powers of x and y, and finally evaluate that at \displaystyle{(x-r,y-s)}, thereby giving a net effect of adding and subtracting zero and rewriting the expression in terms of powers of \displaystyle{x-r} and \displaystyle{y-s}. (This action substitutes for adding and subtracting an infinitesimal, but we need no assumption that infinitesimals exist.) If the original curve was algebraic it will also be analytic, and so the rewritten result will be a Taylor expansion.

    Now, to find the slope of the tangent line, one needs only to see that the equation of the tangent line is the equation setting all the zeroth and first order terms in the Taylor expansion to zero (and discarding all the higher order terms); and the equation of the slope of that line is simply the coefficient of \displaystyle{x} divided by the coefficient of \displaystyle{y}.

    So, let’s compute the first order semiderivative of the unit circle.

    The curve is \displaystyle{x^2+y^2-1}. Evaluating at \displaystyle{(x+r,y+s)}, we get the translated curve \displaystyle{(x+r)^2+(y+s)^2-1}, which expands to \displaystyle{x^2+y^2+2rx+2sy+(s^2+r^2-1)}. The Taylor expansion is therefore \displaystyle{(x-r)^2+(y-s)^2+2r(x-r)+2s(y-s)+(s^2+r^2-1)}.

    To find the equation of the tangent line (the first-order semiderivative with respect to x and y), we set the zeroth and first order terms of the Taylor expansion equal to zero: \displaystyle{2r(x-r)+2s(y-s)+(s^2+r^2-1)=0}. Putting this in the standard y=mx+b line, we get \displaystyle{y=(-r/s)x+(s^2+r^2/2)/s} as the equation of the line tangent to the unit circle at (r,s). Therefore, the derivative of the unit circle curve at the point \displaystyle{(r,s)} on the circle is \displaystyle{(-r/s)} for all points where \displaystyle{s \neq 0}.

    This follows directly for all algebraic curves, and can be confirmed for all analytic curves. For non-analytic curves, it can be shown that we can approximate the derivative as closely as desired.


  24. Nikhil said: “But I will first wait for your comment on the application of the derivative to arbitrary curves and how the limit restricts this applicability.”

    Thank you for reminding me that I said that — I forgot to explain that part.

    I just explained how to apply the geometric definition of the derivative to arbitrary algebraic curves. More complex curves are also available, and there are proofs that the geometric definition yields both exact solutions and a simple method for deriving approximations.

    I also explained one obvious way in which the geometric definition is superior, in that it allows derivatives of curves that aren’t simple functions. But I didn’t explain in what aspects the infinitesimal definition of the derivative is inadequate. Notice that I’m not trying to say that it’s bad or wrong, or that it’s ALWAYS inadequate; rather, I’m pointing out some specific problems that hinder certain uses. Also notice that I’m not complaining about limits; I’m talking specifically about the use of infinitesimals in the definition of the derivative. Limits may still be useful (for example, I mentioned piecewise smooth functions, whose derivatives require limits).

    The most interesting problem is that infinitesimals require the use of the continuum, and not all numbers are embedded in a continuum. The rationals are very useful for most purposes; and floating point computation is a use of a special type of rational number. There are other infinite fields as well, and obviously the finite fields cannot be approached with limits at all (but are quite easily approached with geometry). And yes, the definition of “algebraic curve” applies over any field, finite or infinite, so this method will find its derivative. Complex numbers are reachable as well — in fact, you can probably see that the equation I derived for the tangent line has values over the entire plane, not just on the unit circle, and in fact those values are geometrically meaningful.

    There are more interesting results as well. The tangent line is interesting and useful, but there are also tangent conics, cubics, and so on.


  25. Thank you Tanksley, for your explanation. But I have to admit, there is a lot in the above explanation that I am not familiar with( like the first order semiderivative ) , so I’ll have to go through it step by step. I hope you’ll stay on the site to clarify my doubts!

    In the mean time, can we discuss your earlier comment “The derivative is sufficiently understood as the slope of the line tangent to a curve at a point.” ?

    Let’s say we want draw a tangent to a curve. This raises the question what is a tangent.

    1. Let’s say the tangent a point is a line that best approximates the curve at the point. This raises the question what is meant by best approximation ?

    2. A simplified answer to this question would be that it should have the same value at the point as the curve.

    So if your function is y = x^2. The then at x = 2, y = 4. But you can draw any number of lines through the point (2,4). So how do you go from there?

    For a circle or a conic you can draw the a line from the centre or the focii and then define the tangent as the line that is perpendicular to this line. But how would you draw a tangent to an arbitrary curve (one that has no centre or focii)?

    I would also like to know if my statements 1 and 2 are correct, or do they need some mathematical refinement.

  26. Tanksley, I went through your last post again and I think I am beginning to understand the definition of the derivative as the slope of the tangent.

    Statement 1 should be – the tangent is the best first order approximation to the curve at a point ie it should have the same value as the original curve and also the sam rate of change at that point.

    So, if my function is x^3, and I want to draw a tangent at

    change in y = (x + a)^3 – x^3

    =3a^2 x + 3a x^2 + a^3

    The zeroeth order approximation is found by setting the first and second degree terms to zero. This would be y0 = a^3.This has the same value as the function at x=a.

    The first order approximation is found by setting the second degree terms to zero. This would be y1 = 3 a^2 x + a^3. The slope of this line has the same value as the derivative of the function at x = a ie 3 a^2.

    Intuitively too, this makes sense.Let’s consider a body starting from rest (at t = 0) and undergoing uniform acceleration of 1m per second squared. When I say this body has an instantaneous velocity of 8m per second at t = 8, what it means is that the body has a potential to travel 8m per second, if it were moving at a constant velocity of 8m per second, in either direction. (But this doesn’t happen, because by the time the t becomes 9, the body has already accelerated through 1m per second squared. So the distance travelled between t = 8 and t = 9 is not 8m.)

    Is my understanding correct, or is there something that I’ve missed ?

  27. There’s a mistake in my above post, I said “Statement 1 should be – the tangent is the best first order approximation to the curve at a point ie it should have the same value as the original curve and also the sam rate of change at that point.”

    This is what one would expect given the traditional definition of the tangent.- ie the tangent line to a plane curve at a given point is the straight line that just touches the curve at that point.

    But if you look at the equation to the tangent line derived in my last post, y1 = 3a^2x + a^3,
    at x = a, y = 4 a ^ 3. The point (a, 4 a ^ 3), does not lie on the curve y = x^3.

    So is there something wrong with the definition, or is there something I’ve missed ?

  28. I’m really sorry, but I’m just not able to get the time to reply this weekend. You’re on the right track in general (in fact, I’m quite impressed, given the tiny bit of explanation I’ve been able to give); but there’s more to do.

    If you don’t mind, I’m going to point you to a YouTube video where a fairly complex curve is analyzed according to these rules.

    Unfortunately, he uses some unusual terms while doing this — for example, he denotes the curve using a “polynumber”, which he writes as an array of integers. You may be able to figure how a polynumber is like a polynomial without explicitly written variables; if you need a better explanation the previous videos in his series will explain completely. See the entire playlist at:


  29. Ok Tanksley, I’ll check out the videos, and then I’ll post what I’ve understood. But since I am unfamiliar with a lot of what is being discussed here, I’ll need your confirmation to be sure what I understood is correct. I’ll wait for your comment.

    And thank you for pointing me to these videos. It’s a new approach for me – integrating algebra, geometry and calculus. The only hindrance is my own less rigorous math background. So I’ll have to go through it step by step. I hope you will stay on the site to comment on my progress.

  30. No problem, I’ll be here.

    And if it helps any, the playlist I pointed you to is “MathFundamentals”, so it’s no problem not knowing math. He starts at counting with tallies, if you want to start at the beginning.

    If you wanted an advanced playlist, he’ got one on universal hyperbolic geometry and another on algebraic topology. Whew! :-)


  31. I’m uncertain about calling the derivative a “better division”, although it’s better than “continuous division”. I’d probably call it “generalised division”. It does follow the pattern (established with integrals) that the derivative is about changing quantities. I believe the problematic aspect of the derivative, is that it is a number at a specific point ‘a’, f'(a), but a function at a generalised point, f'(x).

    I do like the 4-step procedure: (1) Choose an interval; (2) Find the raw change; (3) find the rate of change; and (4) Make your model perfect. But the limit is not only about making your model ‘perfect’, because it is also used to *simplify* a problem by neglecting the contribution of a certain component.

    That last step “Make your model perfect” seems to be what the change from Hyperreal numbers to Real numbers (by taking the standard part) is all about. Or at least that was something that immediately sprung to mind.

    Argh. Somebody mentioned the epsilon – delta definition. It’s not so much the definition itself (that basically relates input error to output error), but most explanation are just so….ugh.


  32. Dear Kalid, thank you very much for your efforts and the time that you put in to write such excellent explanations of basic math concepts. I am a student of psychology. I am using this to learn more about calculus and maths. A note: I believe mathematics pedagogy in schools all over the world has to radically change. It is essential and beneficial to use 3 D animations and other visual techniques to impart mathematical ideas. Until that happens there would always be potential mathematicians who would never go on to do set theory or matrices or calculus. Also with better 3 D visual representations and animations of mathematical ideas, fundamental concepts like calculus, graph theory and even dynamical systems could be imparted to students at an earlier age.

  33. Thanks Robin, I really appreciate the note. I love it when people in other fields are able to take away some insights. I definitely think math pedagogy needs to change, to use other techniques, but really, to just ask “Are we actually learning here?”. I feel there’s a giant emperor’s clothes problem where nobody wants to admit “Hey, this concept we’re supposed to be teaching… it’s not clicking at an intuitive level, and it should.” 3d visualizations and other tools can help get ideas to really sink in. Appreciate the note!

  34. one thing that always bothers me is the chicken egg problem. which comes first, differential equation or the function. Let me give and example. Take for instance decay laws which is stated in differential equations. But when you carry out practical experiments, we would plot a graph and would approximate the graph to a function through curve fitting techniques. Now where is differential equation fitting in.Because i can make all the predictions through a function. What is the point in representing event through differential equations if my function could do all the job.

  35. kishore, the differential equation doesn’t come from a function. It comes from a model that predicts the observed values. The model happens to imply certain relationships between physical measurements, which (when stated mathematically, using known laws of physics, and expressed with the smallest number of independent variables) often winds up having integrals and differentials embedded in it — hence it’s a differential equation.

  36. Does not dx ( in our case dx=1) represent acceleration in speed in interval x=2, x=3?

    Between points x=2.8 and x=2.9 is speed

    5.7 = 5.6 + 0.1

  37. Excellent!!!!
    I hope there are more people who can explain maths like this, Removes the fear of maths and puts the joy of learning it.

  38. I find many pluses and minuses with these types of approaches to topics. It’s good because some people find it more approachable. I feel it’s bad if they are not able to convert it to a logical mathematical understanding using mathematical language.

    This is one of the most harmful aspects of math education today. Everyone is focused on pushing standardized testing and standardized testing destroys the nurturing of problem solving and logical understanding of concepts because there is no time and everyone has to play the ‘rat race’ within education to get the ‘golden ticket’ to a nice expensive college. This is what you get when you turn education into a ‘product’ and students into ‘consumers’. Those students who have excellent problem solving skills never have time to nurture them and even wind up having those skills stunted.

    My rant aside, no, the slope of the tangent line is not a bland description at all. The problem here usually is that students don’t have a solid foundation in algebra first, which is a must and then a very good foundation in pre-calculus.

    To see a whole topic on derivatives without a single graph is doing a disservice.

    Good to see students get something here but calculus needs a unified approach and the understanding of the derivative begins with a strong foundation in algebra (coordinate geometry) and pre-calculus.

  39. I think currently your approach is better than others.
    I don’t know why peoples want to sandwich the “Aha! moments” with fast track academies.
    Keep it up! Khalid you have been blessed to reduce complexity to bring in simplicity.

  40. I really dig your articles and I’m going through some of them (mostly because I like math, and (also) because of the fact that I can have some eighth-grade swag at knowing calculus :D).
    Have you ever considered teaching?

    On another note, why don’t you use MathJax for your equations? It’s so much better.

  41. Thanks a ton Khalid. I stumbled on betterexplained while searching for some limits explainations on the web. You are doing an awesome job by reminding people importance of intuition & beauty of maths. As of now before starting any new topic I go through better explained to know what I am going to do. :-)

  42. @Soham: Thanks for the comment! I haven’t thought about in-person teaching that much, but it might be something in the future. I’d like to integrate MathJax as well, the only problem is it doesn’t work in RSS feeds/email. I might find a way to use MathJax on the website and fall back to the images on the other places.

    @Gaurav: Awesome, glad you’re enjoying the site :).

  43. Aha moment came near reading your fab articles must write a high school book soon I wanna read more and more. ,. Plzzz do write on physics too.

  44. Aha moment came near reading your fab articles must write a high school book soon I wanna read more and more. ,. Plzzz do write on physics too.

  45. Congratulations!!! it’s really a very intuitive explanation!
    Analogy is the key! “As above so below”

  46. This was absolutely great….well done. what an excellent explanation of calculus… also I would like to add this.

    I was doing a lot of research and thinking and came to the conclusion that like you mentioned the integral in some respects is not directly related to differentiation. More precisely, the definite integral is unrelated to differentiation, and anti-diffrentiation is the imperfect reversal(opposite operation) of differentiation (very intuitive). The reason why is simple: the definite integral computes the signed area under a curve and the change in position of the original function (i.e (dx/dt) times (dt) equals dx)… which is completely useless if you are trying to find the original function… the antiderivative, however, is useful for that as long as the constant is defined…. the indefinite integral is virtually the same as an anti-derivative except it’s syntax actually means nothing in nature… have you ever wondered why there is a dx (or the appropriate differential) at the end of the integrand even though there are no bounds of integration…??? dx would represent an infinitesimally small width but since there are no bounds of integration the dx means nothing… it’s a dummy variable as some would say…

    great stuff

  47. Dear Khalid,
    I just wanted to say thank you again and the fact that you are very clever and how your dream lies in helping other people understand is great.
    I haven’t been listening to my maths classes recently and therefore I need to do a lot of work.
    Your website had made me more confident.

    indeed the idea of a rate of change at a point is very confusing.

    I want to ask whether your passion for maths or any learning stems from your curiousity, whether you have read history on your maths.

    and i hope you check this out.
    this guy is quite smart and uses analogies too.
    i want to be able to do amke analogies myself so i can understand and relate ideas so i can apply them to life and make use of them. because all learning is precious.

    because i am a person who needs to understand,

    therefore you ahas help (my mother on the other hand are the ones who remember and dont question haha, but indeed there are different people, and their way of learning and how their brain functions, their behaviour, their attitude to learning approaches is different)

    i want to thank you again and how you have left a Question part shows your dedication.
    Yours SIncerely

  48. @mahendra: Thanks!

    @Elisen: Really appreciate the note, thank you. (Scott is a friend and I really like how he breaks down his methods!).

    My passion for math (or learning in general) came when I realized how much simpler an idea could be if we looked at it the right way. Something which was once confusion becomes simple with the right approach (think about how difficult multiplication is with Roman numerals, but how easy it is with decimal numbers). I had this belief that any idea could be made simple, and it’s what keeps me going. If something seems difficult, it’s ok — it just means I haven’t found the simple version of it yet.

    Really glad the site has been helping :).

Your feedback is welcome -- leave a reply!

Your email address will not be published.

LaTeX: $ $e=mc^2$$