The gradient is a fancy word for derivative, or the rate of change of a function. It’s a vector (a direction to move) that
- Points in the direction of greatest increase of a function (intuition on why)
- Is zero at a local maximum or local minimum (because there is no single direction of increase)
The term gradient (grad) typically refers to the derivative of vector functions, or functions of more than one variable. Yes, you can say a line has a gradient (its slope), but using the term gradient for single-variable functions is unnecessarily confusing. Keep it simple.
“Gradient” can refer to gradual changes of color, but we’ll stick to the math definition if that’s ok with you. You’ll see the meanings are related.
Properties of the Gradient
Now that we know the gradient is the derivative of a multi-variable function, let’s derive some properties.
The regular, plain-old derivative gives us the rate of change of a single variable, usually x. For example, dF/dx tells us how much the function F changes for a change in x. But if a function takes multiple variables, such as x and y, it will have multiple derivatives: the value of the function will change when we “wiggle” x (dF/dx) and when we wiggle y (dF/dy).
We can represent these multiple rates of change in a vector, with one component for each derivative. Thus, a function that takes 3 variables will have a gradient with 3 components:
has one variable and a single derivative: 
has three variables and three derivatives: 
The gradient of a multi-variable function has a component for each direction.
And just like the regular derivative, the gradient points in the direction of greatest increase. However, now that we have multiple directions to consider (x, y and z), the direction of greatest increase is no longer simply “forward” or “backward” along the x-axis, like it is with functions of a single variable.
If we have two variables, then our 2-component gradient can specify any direction on a plane. Likewise, with 3 variables, the gradient can specify and direction in 3D space to move to increase our function.
A Twisted Example
I’m a big fan of examples to help solidify an explanation. Suppose we have a magical oven, with coordinates written on it and a special display screen:

We can type any 3 coordinates (like “3,5,2″) and the display shows us the gradient of the temperature at that point.
The microwave also comes with a convenient clock. Unfortunately, the clock comes at a price — the temperature inside the microwave varies drastically from location to location. But this was well worth it: we really wanted that clock.
With me so far? We type in any coordinate, and the microwave spits out the gradient at that location.
Be careful not to confuse the coordinates and the gradient. The coordinates are the current location, measured on the x-y-z axis. The gradient is a direction to move from our current location, such as move up, down, left or right.
Now suppose we are in need of psychiatric help and put the Pillsbury Dough Boy inside the oven because we think he would taste good. He’s made of cookie dough, right? We place him in a random location inside the oven, and our goal is to cook him as fast as possible. The gradient can help!
The gradient at any location points in the direction of greatest increase of a function. In this case, our function measures temperature. So, the gradient tells us which direction to move the doughboy to get him to a location with a higher temperature, to cook him even faster. Remember that the gradient does not give us the coordinates of where to go; it gives us the direction to move to increase our temperature.
Thus, we would start at a random point like (3,5,2) and check the gradient. In this case, the gradient there is (3,4,5). Now, we wouldn’t actually move an entire 3 units to the right, 4 units back, and 5 units up. The gradient is just a direction, so we’d follow this trajectory for a tiny bit, and then check the gradient again.
We get to a new point, pretty close to our original, which has its own gradient. This new gradient is the new best direction to follow. We’d keep repeating this process: move a bit in the gradient direction, check the gradient, and move a bit in the new gradient direction. Every time we nudged along and follow the gradient, we’d get to a warmer and warmer location.
Eventually, we’d get to the hottest part of the oven and that’s where we’d stay, about to enjoy our fresh cookies.
Don’t eat that cookie!
But before you eat those cookies, let’s make some observations about the gradient. That’s more fun, right?
First, when we reach the hottest point in the oven, what is the gradient there?
Zero. Nada. Zilch. Why? Well, once you are at the maximum location, there is no direction of greatest increase. Any direction you follow will lead to a decrease in temperature. It’s like being at the top of a mountain: any direction you move is downhill. A zero gradient tells you to stay put – you are at the max of the function, and can’t do better.
But what if there are two nearby maximums, like two mountains next to each other? You could be at the top of one mountain, but have a bigger peak next to you. In order to get to the highest point, you have to go downhill first.
Ah, now we are venturing into the not-so-pretty underbelly of the gradient. Finding the maximum in regular (single variable) functions means we find all the places where the derivative is zero: there is no direction of greatest increase. If you recall, the regular derivative will point to local minimums and maximums, and the absolute max/min must be tested from these candidate locations.
The same principle applies to the gradient, a generalization of the derivative. You must find multiple locations where the gradient is zero — you’ll have to test these points to see which one is the global maximum. Again, the top of each hill has a zero gradient — you need to compare the height at each to see which one is higher. Now that we have cleared that up, go enjoy your cookie.
Mathematics
We know the definition of the gradient: a derivative for each variable of a function. The gradient symbol is usually an upside-down delta, and called “del” (this makes a bit of sense – delta indicates change in one variable, and the gradient is the change in for all variables). Taking our group of 3 derivatives above

Notice how the x-component of the gradient is the partial derivative with respect to x (similar for y and z). For a one variable function, there is no y-component at all, so the gradient reduces to the derivative.
Also, notice how the gradient can itself be a function!


If we want to find the direction to move to increase our function the fastest, we plug in our current coordinates (such as 3,4,5) into the equation and get:

So, this new vector (1, 8, 75) would be the direction we’d move in to increase the value of our function. In this case, our x-component doesn’t add much to the value of the function: the partial derivative is always 1.
Obvious applications of the gradient are finding the max/min of multivariable functions. Another less obvious but related application is finding the maximum of a constrained function: a function whose x and y values have to lie in a certain domain, i.e. find the maximum of all points constrained to lie along a circle. Solving this calls for my boy Lagrange, but all in due time, all in due time: enjoy the gradient for now.
The key insight is to recognize the gradient as the generalization of the derivative. The gradient points to the maximum of the function; follow the gradient, and you will reach the local maximum.
Questions
Why is the gradient perpendicular to lines of equal potential?
Lines of equal potential (“equipotential”) are the points with the same energy (or value for f(x,y,z)). In the simplest case, a circle represents all items the same distance from the center.
The gradient represents the direction of greatest change. If it had any component along the line of equipotential, then that energy would be wasted (as it’s moving closer to a point at the same energy). When the gradient is perpendicular to the equipotential points, it is moving as far from them as possible (this article explains why the gradient is the direction of greatest increase — it’s the direction that maximizes the varying tradeoffs inside a circle).
121 thoughts on “Vector Calculus: Understanding the Gradient”
i like it… well explained.
Super!!!
You are the man! Nice work!
Thanks, glad it was helpful for you.
i was always looking for conceptual and practical examples and yes i finally got.
Awsome!
well you made a good explanation, that even a not-so-smart guy gets it, but i think you missed the obvious -> WHY does gradient show the direction of the greatest increase.
I think that the principle of the gradient is quite easy, but understanding why does it work the way it does is a bit tricky and you should have focued on it more.
It would be interesting if you would somehow add it to this good article. Inspiration http://mathforum.org/library/drmath/view/68326.html
good luck !
Hi Palo, that’s a great point! I’ve been feeling a bit guilty, if you can imagine it, because I’ve lacked that explanation
I’m probably going to do a separate article on the reason *why* the gradient points in the direction of greatest increase — I have another explanation that it works well with. Thanks for the link and feedback!
Your introduction is not quite correct:
You claim: “Points in the direction of greatest increase of a function”.
Why? It can also point in the direction of greatest decrease of a function.
A gradient is one or more directional derivatives. These derivatives are considered in a particular direction. In the case of single variable calculus, we generally talk about a directional derivative when we consider multiples of the x unit vector, i.e. k*(1,0). To consider the y unit vector, we deal with the partial derivatives with respect to y in a given direction. In three dimensions, the 3 partial derivatives form what we now call a ‘gradient’.
So in fact it is incorrect to call this a slope or anything else except to say that it describes the partial derivatives of a point in the direction of a given vector in space.
Does this make sense? Please visit my blog for some more interesting reading.
http://mathphile.blogspot.com/
Hi John, thanks for writing. You’re right, the formal definition of a gradient is a set of directional derivatives.
But when thinking about the intuitive meaning, I think it’s ok to consider the gradient as a vector that “points” in the direction of greatest increase (i.e. if you follow that direction your function will tend towards a local maximum).
Unless I’m mistaken, the gradient vector always points in the direction of greatest increase (greatest decrease would be in the opposite direction).
What I was saying is that it points either one way or the other, it is not restricted to the direction of greatest increase. As a simple example, consider what happens when you differentiate a parabola: You set the derivative equal to 0 and then you determine that it has either a maximum or a minimum at its turning point. It is not always a maximum just as it is not always a minimum. Think I have explained this correctly now.
good john you have done a great job.
Hi John, thanks for the clarification. I’d still politely disagree and say that in general, the gradient points in the direction of greatest increase
.
In the case of 2 dimensions, the gradient/slope only gives a forward or backward direction. A positive slope means travel “forward” and a negative slope means travel “backwards”.
Consider f(x) = x^2, a regular parobola. The gradient is zero at the minimum (x=0), and there is no *single* direction to go. At x = -1, the slope is negative, which means travel “backwards” (to x = -2) to increase your value. Similarly, at x = 1, you travel forward (to x = 2) to increase your value.
But, as you mention, strange things can happen when the derivative = 0. It can mean you are at a local maximum (no way to improve), or at a local minimum (no single direction to improve your position — forward or back will help). I consider the corner case of zero an exception to the general rule / intuition that the gradient is “the direction to follow” if you want to improve your function.
Wonderful explanation!
Thanks Vidhya, glad you liked it.
hi john keep it up you done a great job
Thanks a bunch! I didn’t think it could be this simple to find the maximum increase at a point, so I thought I’d look it up. Thanks to your great explaination, it turn out it was as easy as it seemed it should be. Great job! Thanks!
Travis
Awesome, glad it worked for you
thanks!!!!
Hi Caitlyn, you’re welcome.
Thanks! The sadistic microwave example helped a lot.
Awesome, glad it was useful
.
Hello Kalid,
Did not read your reply for some
time. Am sorry you do not agree.
Let me give you an example:
Suppose we are dealing with pressure
and height in a certain ‘cubic’
area. Suppose that the middle of the
cube height is 0 meters. Also suppose
that we have a whirlpool generated in the
cube such that the pressure rate increases
as we go below the middle of the cube.
Anything below is negative height and anything above
is positive height. Now, as one rises
higher in the cube, the pressure decreases.
If we find the gradient, then according to
your definition (and many others’), then
the gradient vector for the rate of greatest
increase will point below the middle of the
cube, not above. But above the middle we
find the greatest ‘decrease’ in rate of pressure.
In this example, greatest increase points
downwards and greatest decrease upwards.
It would probably be better to define
gradient as a vector that points in a
direction of greatest increase or decrease.
It’s additive inverse will point in the
diretion of greatest decrease or increase
respectively. For most physical phenomena,
your definition would generally be true.
But what happens when you have an anomaly?
Make sense?
I do not believe I have the best answer to this question but like yourself, I am a believer in trying to find the best possible explanation. Once again, I like your website. Keep up the good work Kalid!
Okay, I think I have the best answer. If f is a real-valued function, then del(f) or gradient of f points to the greatest increase, whereas -del(f) points t0 the greatest decrease.
For once planet math has some decent information on this since I last checked:
http://planetmath.org/encyclopedia/Gradient.html
I do not endorse everything Planet Math publishes but this particular information appears to be correct. In any event, it clears up the previous confusion I think.
Hi John, thanks for the comment! Yes, that’s an important distinction to make: the positive gradient is the greatest increase, and the negative gradient is the greatest decrease. Thanks for helping clarify
.
Thank you!
This actually makes sense to me. Thanks!
@Jared, Bigmouth: Cool, glad it was helpful!
did not grasp the idea
Be more specific. The gradient is the direction to move that gives you the biggest increase.
It helps me a lot. But I have some doubt still now.Is it the same concept for gradient of each vertex in a triangle mesh?
Thanks so much.
Kalid
Thanks for the great explanations! I thought I was math-retarded for some time; however your writings actually make sense to me!
Take care!
Johnny T
@Shaheen: Thanks, glad you enjoyed it. I’m not sure I understand the question: in a triangle mesh, you could measure the gradient at each vertex to find the “best” direction to move. Again, not sure if this is your question.
@Johnny T: Thank you for the comment! Yes, when a subject seems difficult (as vector calculus was for me) sometimes it’s just because the explanation wasn’t clicking properly. Thanks for dropping by.
well done,excellent explaination with solid examples
Thanks Wali, glad you enjoyed it.
thanks
but i have some doubts.how the differentaion gives the maximum space rate of change. as per my understandings differentiation only is difference between two point in the region say p1 and p2.can u clarify
Thanks a lot for explaining the concept.
i was having so much trouble understanding this and now its all clear thank you so much!
@lon, sophie: Thanks, glad you enjoyed it!
Jesus. This was a lot better explained than in my text book and by my professor. I thought we were using the gradient as the normal vector but I really doubted that it could be that.
@Ryan: Thanks! I struggled with this concept for a while also.
thanks ! this explanation made me clear how to find the direction of smallest change.It is just the 90 degree rotation of gradiant(the direction of largest change).
Thanks very much for your effort
Um — in your microwave example, aren’t you pushing the doughboy out the back of the microwave? (Just wanted to understand the concept). I love these essays, btw, keep them coming!
I loved the microwave analogy.also thanks for clarifying the upsidedown delta now everything makes more sense
stil im confused between scalar field and vector field….
how can such a mathematical expression denote the max change? pls i didnt understand the relation of this with mathematics. pls reply sir.
thank you soo much!!
its a big help for our project…
Can we have your number?hehe
@Rahul: A scalar field returns a single value (x), but a vector field returns multiple values (x,y,z). Usually the multiple values (x,y,z) are taken as a “direction” to follow.
@aradhita: Hi, that’s a question I need to get into in a later post.
@nat2_bam2: Thanks!
Hi kalid! i read your explanation. oh this is very helpful! by the way can you give an example on how to apply this on a situation of the classic “mountain and mountain climber” problem? hope you will reply. thanks again your explanations were clear
@Migs: Great question. The classic “mountain climber” problem is when the vector field gives the height of the mountain (z) at a certain position (x,y), so z = f(x,y).
The gradient at any position x,y will give you the direction of the _greatest increase_ in z. That is, the gradient will point in the “most uphill”. Following the gradient will give you the shortest path the the top of the mountain (technically, the top of the nearest local maximum). How this helps!
beautiful…well said
thanks a lot for the wonderful explanation!!!
@akansha: You’re welcome!
Very nice! Keep up. Thanks a lot
Very nice article!!
Hope to see how to find the maximum of a constrained function soon!!
Thanks a lot!!
@Florencia: Glad you liked it! Thanks for the suggestion.
Very good explanation by the way. So if you are on a landscape given by z=cosy-cosx and u want to get from (0,0,0) to (4pi,0,0) by moving in the direction of the gradient in the positive x-direction how would u explain that? What would that path look like?
Thanks for the great explanation. Another topic that would be very interesting for you to cover is the Jacobian, which causes pain for many, many students (including myself).
@P-F: Thanks for the note — I think the Jacobian, and linear algebra in general, would be great to cover. I’ve forgotten a lot of it and am looking to relearning
.
Just wondering something. In that case of f(x,y) = X^2 + y^2, a paraboloid – how can the gradient by perpendicular to the tangent plane at all point and only have components in x and y…
gradF(X,Y) = 2x + 2y
How can it point in any other direction other than parallel to the xy plane?
I’m lost here.
thank you kalil. wonderful explanation.
@prabu: Glad it helped!
It was a great explanation! But I have a specific problem with gradients. Is there any functions that cant be expressed as gradient of any parameter? What could be the properties of that function?
May I could be more specific about my previous problem. If a function is constant in all direction, is it possible to express the function as gradient?
I’m not sure if I understand the question — the gradient of a constant function would be a 0 vector [perhaps technically (0,0)], that is, there is no direction of greatest increase. If it helps, think of the gradient in terms of a derivative (the derivative of a constant function is 0).
Math professional!
Thank you for getting to the heart of why del is required and how to intuitively understand it. Its the first time I understand it so well despite reading so much about it before!
damn! i got it now
math is so beautiful
WOW! great explanation…. thanks dude.:D
@bob: Thanks!
@Anonymous: Agreed
.
Great explanation helped me explain my brother! Nice job! Gonna bookmark it for further needs I might have with it.
great explanation and example
@js: Thanks!
hey, explained really well. But still you didn’t provide any sign of why the gradient would always point in the direction of maximum increase…
I don’t usually comment on blogs, but this is a great explanation. Way better than my text book. A+++++++++
@shreedhar: Thanks — I’d like to cover that in a follow-up article. I need to get a nice, intuitive explanation for it first
.
@Nick: Thanks, glad it helped.
Man! I just love this kind of explanation. It’s so clear and concise, and it shows me that the author really understands the concept himself.
All mathematics should be taught this way. Go from the specific to the general (abstract). Not the other way around, which is the path usually followed by the type who wants to show off his prowess with math symbols and equations.
nice explanation
don’t eat that cookie!
@Al: Thank you! I think one of the big problems in math teaching (especially) is just trying to get things explained without the professor’s “prowess” getting in the way, as you say.
@Anonymous: How could you eat cookies when there’s gradients to be studied?
Nice work!! Thanks man:)
@Pandia: No prob!
Thanks!
Thanks alot,I loved your way explaining this, very helpful indeed.
Keep it up.
@Marwa: Thanks, glad it helped!
Great !! Congrats
Awesome! I was cracking my head trying to figure out HW, only to realize how basic it was after reading through ur page. Thanks!
goodluck with my exam tom. ^^
It wasn’t a bad explanation but I wish you had explained ‘why’ the gradient is the perpendicular vector of the function its derivatives were derived from. This still bothers me a little.
Also, if we have a function with three variables, shouldn’t the independent variable be considered? By considered I mean, if I have a function F(x, y z), then I am saying that w = F(x, y, z), and this function can not be graphed since it has 4 dimentions. A normal F(x, y) can be graphed since you considered the Z, X, and Y of the graph.
From the book I read, I interpreted that the original function has a constant value for ‘w’, hence producing a graph with a new function F2(x, y). However I still didn’t see the math that proves that the gradient of the function F(x, y, z) is actually the vector that is perpendicular to the surface of the graph from which its derivatives were derived from. If you could prove this, it would be really helpful.
thx man very much
I understood it totally from u
my regards
Thank you very much! This made perfect sense and it really helped me out.
@Burton: Thanks!
Cool! Thanks
good explain, it solved my problem
kalid , are u professor
such wonderful explanation……..wow
@shrikant: Thanks!
so very easy method
I love you!
@jayakumar.g: Glad it helped
@Mrigeh:
i want to ask, once knowing the maximum rate of change of temperature in your’s microwave example, how we can attain that particular place without moving our coordinates positions as mentioned by microwave for example when we choose coordinates (3,5,2) we obtain gradient as (3,4,5). now from where we get the information that which coordinates should be selected next time that gives us maximum gradient? should we choose (3,4,5) coordinates?
@hyaa: I’m sorry, I don’t think I understand the question. The gradient gives you the direction (not coordinates) of the greatest increase in your current value. You have to follow the gradient for a bit, get to a new point, get the gradient there, follow it for a bit… and so on to maximize your value.
Think of the gradient as a compass which points towards your greatest increase. A compass doesn’t give you the coordinates of North, but tells you how to get there from your current position. Hope that helps.
Hi, I still have a question. If there is a function h(x,y)to denote the height of a mountain at position(x,y). Can I use the knowledge of gradient to locate the top of the mountain and how?
@Zita: Yep — you start at any point, and keep following the gradient of h to find the top.
great, really good thank you,it would be comprehensive if you explain that ‘why’ the gradient is the perpendicular vector of the function its derivatives were derived from
@max: Great question. Going to add it as a Q & A at the end of the article.
Excellent explanation, I think if you provide your ebook for free of cost it would really be helpful for the poorer students to strengthen thier grass-roots.
GOOD JOB, KEEP IT UP………….
Consider the directional derivative, f_u.
f_u = f_x u_1 + f_y u_2 (it takes some effort to see this definition of f_u)
=grad(f) dot u (u is a unit vector)
=|grad(f)| cos@ (@ is the angle between grad(f) and u)
Thus, it is clear that the directional derivative, f_u, is maxed when cos@=1.
It follows that @=0 and the directional derivative, f_u, is attained when u is in the direction of the gradient. Therefore, the gradient does indeed give the direction of greatest increase.
Note that f_u is minimized when cos@=-1. Thus, @=pi, and u is in the opposite direction of the gradient. QED
ps
I am a nerdy math professor who likes demonstrating mathematical prowess. Thanks for the microwave intuition builder. My students are going to like that.
I already knew this but you gave me a better intuition of it and I like your style of writing! Thank you!
@Chico: Awesome, thanks for sharing! I like that a lot — lining up with the gradient (out of all possible directional derivatives) will give you the best return (cosine = 1). That clicks for me.
Glad you enjoyed the microwave intuition, I love searching for little analogies.
@Deniz: Thanks! And you’re welcome
.