<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BetterExplained &#187; Math</title>
	<atom:link href="http://betterexplained.com/articles/category/math/feed/" rel="self" type="application/rss+xml" />
	<link>http://betterexplained.com</link>
	<description>Learn Right, Not Rote.</description>
	<lastBuildDate>Wed, 16 May 2012 00:45:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>How To Understand Derivatives: The Product, Power &amp; Chain Rules</title>
		<link>http://betterexplained.com/articles/derivatives-product-power-chain/</link>
		<comments>http://betterexplained.com/articles/derivatives-product-power-chain/#comments</comments>
		<pubDate>Fri, 11 May 2012 17:53:07 +0000</pubDate>
		<dc:creator>kalid</dc:creator>
				<category><![CDATA[Calculus]]></category>
		<category><![CDATA[Math]]></category>

		<guid isPermaLink="false">http://betterexplained.com/?p=2218</guid>
		<description><![CDATA[The jumble of rules for taking derivatives never truly clicked for me. The addition rule, product rule, quotient rule &#8212; how do they fit together? What are we even trying to <em>do</em>?

Here&#8217;s my take on derivatives:

<ul>
<li>We have </li>&#8230; <a href="http://betterexplained.com/articles/derivatives-product-power-chain/" class="read_more">Read article</a></ul>]]></description>
			<content:encoded><![CDATA[<p>The jumble of rules for taking derivatives never truly clicked for me. The addition rule, product rule, quotient rule &#8212; how do they fit together? What are we even trying to <em>do</em>?</p>

<p>Here&#8217;s my take on derivatives:</p>

<ul>
<li>We have a system to analyze, our function f</li>
<li>The derivative f&#8217; (df/dx) is the <a href="http://betterexplained.com/articles/calculus-building-intuition-for-the-derivative/">moment-by-moment behavior</a></li>
<li>It turns out f is part of a bigger system (h = f + g)</li>
<li>Using the behavior of the parts, can we figure out the behavior of the whole?</li>
</ul>

<p>Yes. <strong>Every part has a &#8220;point of view&#8221; about how much change it added. Combine every point of view to get the overall behavior.</strong> Each derivative rule is an example of merging various points of view.</p>

<p>And why don&#8217;t we analyze the entire system at once? For the same reason you don&#8217;t eat a hamburger in one bite: small parts are easier to wrap your head around.</p>

<p>Instead of memorizing separate rules, let&#8217;s see how they fit together:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/derivatives/table.part1.png" alt="table" /></p>

<p>The goal is to really grok the notion of &#8220;combining perspectives&#8221;. This installment covers addition, multiplication, powers and the chain rule. Onward!</p>

<h2>Functions: Anything, Anything But Graphs</h2>

<p>The default calculus explanation writes &#8220;f(x) = x^2&#8243; and shoves a graph in your face. Does this really help our intuition?</p>

<p>Not for me. Graphs squash input and output into a single curve, and hide the machinery that turns one into the other. But the derivative rules are <em>about</em> the machinery, so let&#8217;s see it!</p>

<p>I visualize a function as the process &#8220;input(x) => f => output(y)&#8221;.</p>

<p><img src="http://betterexplained.com/wp-content/uploads/derivatives/simplefunction.png" alt="simple function" /></p>

<p>It&#8217;s not just me. Check out this incredible, mechanical targetting computer (<a href="http://www.youtube.com/watch?v=mpkTHyfr0pM">beginning of youtube series</a>).</p>

<iframe title="YouTube video player" width="480" height="390" src="http://www.youtube.com/embed/-F7m02XDfvE" frameborder="0" allowfullscreen></iframe>

<p>The machine computes functions like addition and multiplication with gears &#8212; you can <em>see the mechanics</em> unfolding!</p>

<p><img src="http://betterexplained.com/wp-content/uploads/derivatives/mechanicalfunctions.png" alt="simple function" /></p>

<p>Think of function f as a machine with an input lever &#8220;x&#8221; and an output lever &#8220;y&#8221;. As we adjust x, f sets the height for y. Another analogy: x is the input signal, f receives it, does some magic, and spits out signal y. Use whatever <a href="http://betterexplained.com/articles/learning-to-learn-embrace-analogies/">analogy</a> helps it click.</p>

<h2>Wiggle Wiggle Wiggle</h2>

<p>The derivative is the &#8220;moment-by-moment&#8221; behavior of the function. What does that mean? (And don&#8217;t mindlessly mumble &#8220;The derivative is the slope&#8221;. <em>See any graphs around these parts, fella?</em>)</p>

<p>The derivative is how much we wiggle. The lever is at x, we &#8220;wiggle&#8221; it, and see how y changes. &#8220;Oh, we moved the input lever 1mm, and the output moved 5mm. Interesting.&#8221;</p>

<p>The result can be written &#8220;output wiggle per input wiggle&#8221; or &#8220;dy/dx&#8221; (5mm / 1mm = 5, in our case). This is usually a formula, not a static value, because it can depend on your current input setting.</p>

<p>For example, when f(x) = x^2, the derivative is 2x. Yep, you&#8217;ve memorized that. What does it mean?</p>

<p>If our input lever is at x = 10 and we wiggle it slightly (moving it by dx=0.1 to 10.1), the output should change by dy. How much, exactly?</p>

<ul>
<li>We know f&#8217;(x) = dy/dx = 2 * x</li>
<li>At x = 10 the &#8220;output wiggle per input wiggle&#8221; is = 2 * 10 = 20. The output moves 20 units for every unit of input movement.</li>
<li>If dx = 0.1, then dy = 20 * dx = 20 * .1 = 2</li>
</ul>

<p>And indeed, the difference between 10^2 and (10.1)^2 is about 2. The derivative estimated how far the output lever would move (a perfect, infinitely small wiggle would move 2 units; we moved 2.01).</p>

<p>The key to understanding the derivative rules:</p>

<ul>
<li>Set up your system</li>
<li>Wiggle each part of the system separately, see how far the output moves</li>
<li>Combine the results</li>
</ul>

<p>The total wiggle is the sum of wiggles from each part.</p>

<h2>Addition and Subtraction</h2>

<p>Time for our first system:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/46391a746f43c2c61627e76c85fd5246.png' title='\displaystyle{h(x) = f(x) + g(x) }' alt='\displaystyle{h(x) = f(x) + g(x) }' align=absmiddle class='tex'></p>

<p><img src="http://betterexplained.com/wp-content/uploads/derivatives/addition.png" alt="derivative addition" /></p>

<p>What happens when the input (x) changes?</p>

<p>In my head, I think &#8220;Function h takes a single input. It feeds the same input to f and g and adds the output levers. f and g wiggle independently, and don&#8217;t even know about each other!&#8221;</p>

<p>Function f knows it will contribute some wiggle (df), g knows it will contribute some wiggle (dg), and we, the prowling overseers that we are, know their individual moment-by-moment behaviors are added:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/cf660f80d53e1d0bd120a5b43a7c9016.png' title='\displaystyle{dh = df + dg}' alt='\displaystyle{dh = df + dg}' align=absmiddle class='tex'>
<img src="http://betterexplained.com/wp-content/uploads/derivatives/addition_deriv.png" alt="derivative addition" /></p>

<p>Again, let&#8217;s describe each &#8220;point of view&#8221;:</p>

<ul>
<li>The overall system has behavior dh</li>
<li>From f&#8217;s perspective, it contributes df to the whole [it doesn't know about g]</li>
<li>From g&#8217;s perspective, it contributes dg to the whole [it doesn't know about f]</li>
</ul>

<p>Every change to a system is due to some part changing (f and g). If we add the contributions from each possible variable, we&#8217;ve described the entire system.</p>

<h2>df vs df/dx</h2>

<p>Sometimes we use df, other times df/dx &#8212; what gives? (This confused me for a while)</p>

<ul>
<li><strong>df</strong> is a general notion of &#8220;however much f changed&#8221;</li>
<li><strong>df/dx</strong> is a specific notion of &#8220;however much f changed, in terms of how much x changed&#8221;</li>
</ul>

<p>The generic &#8220;df&#8221; helps us see the overall behavior.</p>

<p>An analogy: Imagine you&#8217;re driving cross-country and want to measure the fuel efficiency of your car. You&#8217;d measure the distance traveled, check your tank to see how much gas you used, and finally do the division to compute &#8220;miles per gallon&#8221;. You measured distance and gasoline separately &#8212; you didn&#8217;t jump into the gas tank to get the rate on the go!</p>

<p>In calculus, sometimes we want to think about the actual change, not the ratio. Working at the &#8220;df&#8221; level gives us room to think about how the function wiggles overall. We can <em>eventually</em> scale it down in terms of a specific input.</p>

<p>And we&#8217;ll do that now. The addition rule above can be written, on a &#8220;per dx&#8221; basis, as:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/5f8d7df0240a4f7c2d1711c310fab896.png' title='\displaystyle{\frac{dh}{dx} = \frac{df}{dx} + \frac{dg}{dx}}' alt='\displaystyle{\frac{dh}{dx} = \frac{df}{dx} + \frac{dg}{dx}}' align=absmiddle class='tex'></p>

<h2>Multiplication (Product Rule)</h2>

<p>Next puzzle: suppose our system multiplies parts &#8220;f&#8221; and g&#8221;. How does it behave?</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/3acde657b80c28a13e03c48992f7933b.png' title='\displaystyle{h(x) = f(x) \cdot g(x)}' alt='\displaystyle{h(x) = f(x) \cdot g(x)}' align=absmiddle class='tex'></p>

<p>Hrm, tricky &#8212; the parts are interacting more closely. But the strategy is the same: see how each part contributes from its own point of view, and combine them:</p>

<ul>
<li>total change in h = f&#8217;s contribution (from f&#8217;s point of view) + g&#8217;s contribution (from g&#8217;s point of view)</li>
</ul>

<p>Check out this diagram:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/derivatives/productrule.png" alt="derivative product rule" /></p>

<p>What&#8217;s going on?</p>

<ul>
<li>We have our system: f and g are multiplied, giving h (the area of the rectangle)</li>
<li>Input &#8220;x&#8221; changes by dx off in the distance. f changes by some amount df (think absolute change, not the rate!). Similarly, g changes by its own amount dg. Because f and g changed, the area of the rectangle changes too.</li>
<li>What&#8217;s the area change from f&#8217;s point of view? Well, f knows he changed by df, but has <em>no idea</em> what happened to g. From f&#8217;s perspective, he&#8217;s the only one who moved and will add a slice of area = df * g</li>
<li>Similarly, g doesn&#8217;t know how f changed, but knows he&#8217;ll add as slice of area &#8220;dg * f&#8221;</li>
</ul>

<p>The overall change in the system (dh) is the two slices of area:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/9bebf672a514f2a7cb341420e9e49d64.png' title='\displaystyle{dh = f \cdot dg + g \cdot df}' alt='\displaystyle{dh = f \cdot dg + g \cdot df}' align=absmiddle class='tex'></p>

<p>Now, like our miles per gallon example, we &#8220;divide by dx&#8221; to write this in terms of how much x changed:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/38b37da23aadbbb1269302092eea3ee3.png' title='\displaystyle{\frac{dh}{dx} = f \cdot \frac{dg}{dx} + g \cdot \frac{df}{dx}}' alt='\displaystyle{\frac{dh}{dx} = f \cdot \frac{dg}{dx} + g \cdot \frac{df}{dx}}' align=absmiddle class='tex'></p>

<p>(Aside: Divide by dx? Engineers will nod, mathematicians will frown. Technically, df/dx is not a fraction: it&#8217;s the entire operation of taking the derivative (with the limit and all that). But infinitesimal-wise, intuition-wise, we are &#8220;scaling by dx&#8221;. I&#8217;m a smiler.)</p>

<p>The key to the product rule: add two &#8220;slivers of area&#8221;, one from each point of view.</p>

<p><strong>Gotcha:</strong> But isn&#8217;t there some effect from both f and g changing simultaneously (df * dg)?</p>

<p>Yep. However, this area is an infinitesimal * infinitesimal (a &#8220;2nd-order infinitesimal&#8221;) and invisible at the current level. It&#8217;s a tricky concept, but (df * dg) / dx vanishes compared to normal derivatives like df/dx. We vary f and g indepdendently and combine the results, and ignore results from them moving together.</p>

<h2>The Chain Rule: It&#8217;s Not So Bad</h2>

<p>Let&#8217;s say g depends on f, which depends on x:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/6ff504a4d681ef65f06ae88750973e4d.png' title='\displaystyle{y = g(f(x))}' alt='\displaystyle{y = g(f(x))}' align=absmiddle class='tex'>
<img src="http://betterexplained.com/wp-content/uploads/derivatives/chainrulelink.png" alt="derivative product rule" /></p>

<p>The chain rule lets us &#8220;zoom into&#8221; a function and see how an initial change (x) can effect the final result down the line (g).</p>

<p><strong>Interpretation 1: Convert the rates</strong> </p>

<p>A common interpretation is to multiply the rates:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/405dd30666ed835241fc0b283659bf6f.png' title='\displaystyle{\frac{dg}{dx} = \frac{dg}{df} \cdot \frac{df}{dx}}' alt='\displaystyle{\frac{dg}{dx} = \frac{dg}{df} \cdot \frac{df}{dx}}' align=absmiddle class='tex'></p>

<p>x wiggles f. This creates a rate of change of df/dx, which wiggles g by dg/df. The entire wiggle is then:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/5955bc53c0a52ddf7bd36fba37a191c2.png' title='\displaystyle{\frac{dg}{df} \cdot \frac{df}{dx}}' alt='\displaystyle{\frac{dg}{df} \cdot \frac{df}{dx}}' align=absmiddle class='tex'></p>

<p>This is similar to the &#8220;factor-label&#8221; method in chemistry class:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/952c69e876e176b3a6c3e715e325352d.png' title='\displaystyle{\frac{miles}{second} = \frac{miles}{hour} \cdot \frac{1 \ hour}{60 \ minutes} \cdot \frac{1 \ minute}{60 \ seconds} = \frac{miles}{hour} \cdot \frac{1}{3600}}' alt='\displaystyle{\frac{miles}{second} = \frac{miles}{hour} \cdot \frac{1 \ hour}{60 \ minutes} \cdot \frac{1 \ minute}{60 \ seconds} = \frac{miles}{hour} \cdot \frac{1}{3600}}' align=absmiddle class='tex'></p>

<p>If your &#8220;miles per second&#8221; rate changes, multiply by the conversion factor to get the new &#8220;miles per hour&#8221;. The second doesn&#8217;t know about the hour directly &#8212; it goes through the second => minute conversion.</p>

<p>Similarly, g doesn&#8217;t know about x directly, only f. Function g knows it should scale its input by dg/df to get the output. The initial rate (df/dx) gets modified as it moves up the chain.</p>

<p><strong>Interpretation 2: Convert the wiggle</strong></p>

<p>I prefer to see the chain rule on the &#8220;per-wiggle&#8221; basis:</p>

<ul>
<li>x wiggles by dx, so</li>
<li>f wiggles by df, so</li>
<li>g wiggles by dg</li>
</ul>

<p>Cool. But how are they actually related? Oh yeah, the derivative! (It&#8217;s the output wiggle per input wiggle):</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/b06cbba74b039b96356aa6e510e79de5.png' title='\displaystyle{df = dx \cdot \frac{df}{dx}}' alt='\displaystyle{df = dx \cdot \frac{df}{dx}}' align=absmiddle class='tex'></p>

<p>Remember, the derivative of f (df/dx) is how much to scale the initial wiggle. And the same happens to g:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/14655de899b6c328b8a914ce368e1a23.png' title='\displaystyle{dg = df \cdot \frac{dg}{df}}' alt='\displaystyle{dg = df \cdot \frac{dg}{df}}' align=absmiddle class='tex'></p>

<p>It will scale whatever wiggle comes along its input lever (f) by dg/df. If we write the df wiggle in terms of dx:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/218cf6998073d2dcb1cf82b4b58de61c.png' title='\displaystyle{dg = (dx \cdot \frac{df}{dx}) \cdot \frac{dg}{df}}' alt='\displaystyle{dg = (dx \cdot \frac{df}{dx}) \cdot \frac{dg}{df}}' align=absmiddle class='tex'></p>

<p>We have another version of the chain rule: dx starts the chain, which results in some final result dg. If we want the final wiggle in terms of dx, divide both sides by dx:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/106a06f2e631664f80c0814922261998.png' title='\displaystyle{\frac{dg}{dx} = \frac{df}{dx} \cdot \frac{dg}{df}}' alt='\displaystyle{\frac{dg}{dx} = \frac{df}{dx} \cdot \frac{dg}{df}}' align=absmiddle class='tex'></p>

<p>The chain rule isn&#8217;t just factor-label unit cancellation &#8212; it&#8217;s the propagation of a wiggle, which gets adjusted at each step.</p>

<p>The chain rule works for several variables (a depends on b depends on c), just propagate the wiggle as you.</p>

<p>Try to imagine &#8220;zooming into&#8221; different variable&#8217;s point of view. Starting from dx and looking up, you see the entire chain of transformations needed before the impulse reaches g.</p>

<h2>Chain Rule: Example Time</h2>

<p>Let&#8217;s say we put a &#8220;squaring machine&#8221; in front of a &#8220;cubing machine&#8221;:</p>

<p>input(x) => f:x^2 => g:f^3 => output(y)</p>

<p>f:x^2 means f squares its input. g:f^3 means g cubes its input, the value of f. For example:</p>

<p>input(2) => f(2) => g(4) => output:64</p>

<p>Start with 2, f squares it (2^2 = 4), and g cubes this (4^3 = 64). It&#8217;s a 6th power machine:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/69e1910c25b53d041aeb62ab5b6812d1.png' title='\displaystyle{g(f(x)) = (x^2)^3}' alt='\displaystyle{g(f(x)) = (x^2)^3}' align=absmiddle class='tex'></p>

<p>And what&#8217;s the derivative?</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/d0bf14e6eb92b8170a4badafd6eb17fc.png' title='\displaystyle{ \frac{dg}{dx} = \frac{dg}{df} \cdot \frac{df}{dx}}' alt='\displaystyle{ \frac{dg}{dx} = \frac{dg}{df} \cdot \frac{df}{dx}}' align=absmiddle class='tex'></p>

<ul>
<li>f changes its input wiggle by df/dx = 2x</li>
<li>g changes its input wiggle by dg/df = 3f^2</li>
</ul>

<p>The final change is:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/2fdac12b727e02da56fd0a205f0b0240.png' title='\displaystyle{3f^2 \cdot 2x = 3(x^2)^2 \cdot 2x = 3x^4 \cdot 2x = 6x^5}' alt='\displaystyle{3f^2 \cdot 2x = 3(x^2)^2 \cdot 2x = 3x^4 \cdot 2x = 6x^5}' align=absmiddle class='tex'></p>

<h2>Chain Rule: Gotchas</h2>

<p><strong>Functions treat their inputs like a blob</strong></p>

<p>In the example, g&#8217;s derivative (&#8220;x^3 = 3x^2&#8243;) doesn&#8217;t refer to the original &#8220;x&#8221;, just whatever the input was (foo^3 = 3*foo^2). The input was f, and it treats f as a single value. Later on, we scurry in and rewrite f in terms of x. But g has no involvement with that &#8212; it doesn&#8217;t care that f can be rewritten in terms of smaller pieces.</p>

<p><strong>In many examples, the variable &#8220;x&#8221; is the &#8220;end of the line&#8221;.</strong></p>

<p>Questions ask for df/dx, i.e. &#8220;Give me changes from x&#8217;s point of view&#8221;. Now, x could depend on something deeper variable, but that&#8217;s not being asked for. It&#8217;s like saying &#8220;I want miles per hour. I don&#8217;t care about miles per minute or miles per second. Just give me miles per hour&#8221;. df/dx means &#8220;stop looking at inputs once you get to x&#8221;.</p>

<p><strong>How come we multiply derivatives with the chain rule, but add them for the others?</strong></p>

<p>The regular rules are about <em>combining points of view</em> to get an overall picture. What change does f see? What change does g see? Add them up for the total.</p>

<p>The chain rule is about going deeper into a single part (like f) and seeing if it&#8217;s controlled by another variable. It&#8217;s like looking inside a clock and saying &#8220;Hey, the minute hand is controlled by the second hand!&#8221;. We&#8217;re staying inside the same part.</p>

<p>Sure, eventually this &#8220;per-second&#8221; perspective of f could be added to some perspective from g. Great. But the chain rule is about diving deeper into &#8220;f&#8217;s&#8221; root causes.</p>

<h2>Power Rule: Oft Memorized, Seldom Understood</h2>

<p>What&#8217;s the derivative of x^4? 4x^3? Great. You brought down the exponent and subtracted one. Now explain why!</p>

<p>Hrm. There&#8217;s a few approaches, but here&#8217;s my new favorite: x^4 is really x * x * x * x. It&#8217;s the multiplication of 4 &#8220;independent&#8221; variables. Each x doesn&#8217;t know about the others, it might as well be x * u * v * w.</p>

<p>Now think about the first x&#8217;s point of view:</p>

<ul>
<li>It changes from x to x + dx</li>
<li>The change in the overall function is [(x + dx) - x][u * v * w] = dx[u * v * w]</li>
<li>The change on a &#8220;per dx&#8221; basis is [u * v * w]</li>
</ul>

<p>Similarly,</p>

<ul>
<li>From u&#8217;s point of view, it changes by du. It contributes (du/dx)*[x * v * w] on a &#8220;per dx&#8221; basis</li>
<li>v contributes (dv/dx) * [x * u * w]</li>
<li>w contributes (dw/dx) * [x * u * v]</li>
</ul>

<p>The curtain is unveiled: x, u, v, and w are the same! The &#8220;point of view&#8221; conversion factor is 1 (du/dx = dv/dx = dw/dx = dx/dx = 1), and the total change is</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/eec5f06b91aae12d04ce63ffc55d78ab.png' title='\displaystyle{(x \cdot x \cdot x) + (x \cdot x \cdot x) + (x \cdot x \cdot x) + (x \cdot x \cdot x) = 4 x^3}' alt='\displaystyle{(x \cdot x \cdot x) + (x \cdot x \cdot x) + (x \cdot x \cdot x) + (x \cdot x \cdot x) = 4 x^3}' align=absmiddle class='tex'></p>

<p>In a sentence: the derivative of x^4 is 4x^3 because x^4 has four identical &#8220;points of view&#8221; which are being combined. Booyeah!</p>

<h2>Take A Breather</h2>

<p>I hope you&#8217;re seeing the derivative in a new light: we have a system of parts, we wiggle our input and see how the whole thing moves.  It&#8217;s about combining perspectives: what does each part add to the whole?</p>

<p>In the follow-up article, we&#8217;ll look at even more powerful rules (exponents, quotients, and friends). Happy math.</p>
]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/derivatives-product-power-chain/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Calculus: Building Intuition for the Derivative</title>
		<link>http://betterexplained.com/articles/calculus-building-intuition-for-the-derivative/</link>
		<comments>http://betterexplained.com/articles/calculus-building-intuition-for-the-derivative/#comments</comments>
		<pubDate>Thu, 29 Mar 2012 11:09:25 +0000</pubDate>
		<dc:creator>kalid</dc:creator>
				<category><![CDATA[Calculus]]></category>
		<category><![CDATA[Math]]></category>

		<guid isPermaLink="false">http://betterexplained.com/?p=1973</guid>
		<description><![CDATA[How do you wish the derivative was explained to you? Here's my take.

Psst! The derivative is the heart of calculus, buried inside this definition:



But what does it mean?

Let's say I gave you a magic newspaper that listed&#8230; <a href="http://betterexplained.com/articles/calculus-building-intuition-for-the-derivative/" class="read_more">Read article</a>]]></description>
			<content:encoded><![CDATA[<p>How do you wish the derivative was explained to you? Here's my take.</p>

<p>Psst! The derivative is the heart of calculus, buried inside this definition:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/7a7c3a79212037689439fdf710637b27.png' title='\displaystyle{ f&#039;(x) =\lim_{dx\to 0} \frac{f(x+dx)-f(x)}{dx}}' alt='\displaystyle{ f&#039;(x) =\lim_{dx\to 0} \frac{f(x+dx)-f(x)}{dx}}' align=absmiddle class='tex'></p>

<p>But what does it mean?</p>

<p>Let's say I gave you a magic newspaper that listed the daily stock market changes for the next few years (+1% Monday, -2% Tuesday...). What could you do?</p>

<p>Well, you'd apply the changes one-by-one, plot out future prices, and buy low / sell high to build your empire. You could even hire away the monkeys who currently throw darts at newspapers.</p>

<p>Others call the derivative "the slope of a function" -- it's so bland! Like the stock list, the derivative is a total, predictive understanding of a system. You can plot the past/present/future, find minimums/maximums, and yes, staff your simian workforce.</p>

<p>Step away from the gnarly equation. Equations exist to convey ideas: understand the idea, not the grammar.</p>

<p><strong>Derivatives create a perfect model of change from an imperfect guess.</strong></p>

<p>This result came over thousands of years of thinking, from Archimedes to Newton. Let's look at the analogies behind it.</p>

<h2>We all live in a shiny continuum</h2>

<p>Infinity is a constant source of paradoxes ("headaches"):</p>

<ul>
<li>A line is made up of points? <em>Sure.</em></li>
<li>So there's an infinite number of points on a line? <em>Yep.</em></li>
<li>How do you cross a room when there's an infinite number of points to visit? <em>(Gee, thanks <a href="http://en.wikipedia.org/wiki/Zeno's_paradoxes">Zeno</a>).</em></li>
</ul>

<p>And yet, we move. My intuition is to fight infinity with infinity. Sure, there's infinity points between 0 and 1. But I move <em>two infinities</em> of points per second (somehow!) and I cross the gap in half a second.</p>

<p>Distance has infinite points, motion is possible, therefore motion is in terms of "infinities of points per second".</p>

<p>Instead of thinking of differences ("How far to the next point?") we can compare rates ("How fast are you moving through this continuum?").</p>

<p>It's strange, but you can see 10/5 as "I need to travel 10 'infinities' in 5 segments of time. To do this, I travel 2 'infinities' for each unit of time".</p>

<p><strong>Analogy: See division as a rate of motion through a continuum of points</strong></p>

<h2>What's after zero?</h2>

<p>Another brain-buster: What number comes after zero? .01? .0001?</p>

<p>Hrm. Anything you can name, I can name smaller (I'll just halve your number... nyah!).</p>

<p>Even though we can't <em>calculate</em> the number after zero, it must be there, right? Like demons of yore, it's the "number that cannot be written, lest ye be smitten".</p>

<p>Call the gap to the next number "dx". I don't know exactly how big it is, but it's there!</p>

<p><strong>Analogy: dx is a "jump" to the next number in the continuum.</strong></p>

<h2>Measurements depend on the instrument</h2>

<p>The derivative predicts change. Ok, how do we measure speed (change in distance)?</p>

<blockquote>
  <p>Officer: Do you know how fast you were going?</p>
  
  <p>Driver: I have no idea.</p>
  
  <p>Officer: 95 miles per hour.</p>
  
  <p>Driver: But I haven't been driving for an hour!</p>
</blockquote>

<p>We clearly don't need a "full hour" to measure your speed. We can take a before-and-after measurement (over 1 second, let's say) and get your instantaneous speed. If you moved 140 feet in one second, you're going ~95mph. Simple, right?</p>

<p>Not exactly. Imagine a video camera pointed at Clark Kent (Superman's alter-ego). The camera records 24 pictures/sec (40ms per photo) and Clark seems still. On a second-by-second basis, he's not moving, and his speed is 0mph.</p>

<p>Wrong again! Between each photo, within that 40ms, Clark changes to Superman, solves crimes, and returns to his chair for a nice photo. We measured 0mph but he's really moving -- he goes too fast for our instruments!</p>

<p><strong>Analogy: Like a camera watching Superman, the speed we measure depends on the instrument!</strong></p>

<h2>Running the Treadmill</h2>

<p>We're nearing the chewy, slightly tangy center of the derivative. We need before-and-after measurements to detect change, but our measurements could be flawed.</p>

<p>Imagine a shirtless Santa on a treadmill (go on, I'll wait). We're going to measure his heart rate in a stress test: we attach dozens of heavy, cold electrodes and get him jogging.</p>

<p>Santa huffs, he puffs, and his heart rate shoots to 190 beats per minute. That must be his "under stress" heart rate, correct?</p>

<p>Nope. See, the very presence of stern scientists and cold electrodes increased his heart rate! We <em>measured</em> 190bpm, but who knows what we'd see if the electrodes weren't there! Of course, if the electrodes weren't there, we wouldn't have a measurement.</p>

<p>What to do? Well, look at the system:</p>

<ul>
<li>measurement = actual amount + measurement effect</li>
</ul>

<p>Ah. After lots of studies, we may find "Oh, each electrode adds 10bpm to the heartrate". We make the measurement (imperfect guess of 190) and remove the effect of electrodes ("perfect estimate").</p>

<p><strong>Analogy: Remove the "electrode effect" after making your measurement</strong></p>

<p>By the way, the "electrode effect" shows up everywhere. Research studies have the <a href="http://en.wikipedia.org/wiki/Hawthorne_effect">Hawthorne Effect</a> where people change their behavior <em>because they are being studied</em>. Gee, it seems everyone we scrutinize sticks to their diet!</p>

<h2>Understanding the derivative</h2>

<p>Armed with these insights, we can see how the derivative models change:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/calculus/derivative-explanation.png" alt="Derivative explanation" /></p>

<p>Start with some system to study, f(x):</p>

<ol>
<li>Change by the smallest amount possible (dx)</li>
<li>Get the before-and-after difference: f(x + dx) - f(x)</li>
<li>We don't know exactly how small "dx" is, and we don't care: get the <strong>rate of motion</strong> through the continuum: [f(x + dx) - f(x)] / dx</li>
<li>This rate, however small, has some error (our cameras are too slow!). Predict what happens if the measurement were perfect, if dx wasn't there.</li>
</ol>

<p>The magic's in the final step: how do we remove the electrodes? We have <a href="http://betterexplained.com/articles/why-do-we-need-limits-and-infinitesimals/">two approaches</a>:</p>

<ul>
<li>Limits: what happens when dx shrinks to nothingness, beyond any error margin?</li>
<li>Infinitesimals: What if dx is a tiny number, undetectable in our number system?</li>
</ul>

<p>Both are ways to formalize the notion of "How do we throw away dx when it's not needed?".</p>

<p>My pet peeve: Limits are a modern formalism, they didn't exist in Newton's time. They help make dx disappear "cleanly". But teaching them before the derivative is like showing a steering wheel without a car! It's a tool to help the derivative work, not something to be studied in a vacuum.</p>

<h2>An Example: f(x) = x^2</h2>

<p>Let's shake loose the cobwebs with an example. How does the function f(x) = x^2 change as we move through the continuum?</p>

<!--
\begin{align*}
f'(x) &#038;= \lim_{dx\to 0} \frac{f(x+dx)-f(x)}{dx} \\
&#038;= \lim_{dx\to 0} \frac{(x+dx)^2-x^2}{dx} \\
&#038;= \lim_{dx\to 0} \frac{x^2 + 2xdx + dx^2 - x^2}{dx} \\
&#038;= \lim_{dx\to 0} 2x + dx \\
&#038;= 2x
\end{align*}
-->

<p><img src="http://betterexplained.com/wp-content/uploads/calculus/derivative-eqn.gif" alt="Derivative explanation" /></p>

<p>Note the difference in the last 2 equations:</p>

<ul>
<li>One has the error built in (dx)</li>
<li>The other has the "true" change, where dx = 0 (our measurements have no effect on the outcome)</li>
</ul>

<p>Time for real numbers. Here's the values for f(x) = x^2, with intervals of dx = 1:</p>

<ul>
<li>1, 4, 9, 16, 25, 36, 49, 64...</li>
</ul>

<p>The absolute change between each result is:</p>

<ul>
<li>1, 3, 5, 7, 9, 11, 13, 15...</li>
</ul>

<p>(Here, the absolute change is the "speed" between each step, where the interval is 1)</p>

<p>Consider the jump from x=2 to x=3 (3^2 - 2^2 = 5). What is "5" made of?</p>

<ul>
<li>Measured rate = Actual Rate + Error</li>
<li>5 = 2x + dx</li>
<li>5 = 2(2) + 1</li>
</ul>

<p>Sure, we measured a "5 units moved per second" because we went from 4 to 9 in one interval. But our instruments trick us! 4 units of speed came from the real change, and 1 unit was due to shoddy instruments (1.0 is a large jump, no?).</p>

<p>If we restrict ourselves to integers, 5 is the perfect speed measurement from 4 to 9. There's no "error" in assuming dx = 1 because that's the true interval between neighboring points.</p>

<p>But in the real world, measurements every 1.0 seconds is too slow. What if our dx was 0.1? What speed would we measure at x=2?</p>

<p>Well, we examine the change from x=2 to x=2.1:</p>

<ul>
<li>2.1^2 - 2^2 = 0.41</li>
</ul>

<p>Remember, 0.41 is what we changed in an interval of 0.1. Our speed-per-unit is 0.41 / .1 = 4.1. And again we have:</p>

<ul>
<li>Measured rate = Actual Rate + Error</li>
<li>4.1 = 2x + dx</li>
</ul>

<p>Interesting. With dx=0.1, the measured and actual rates are close (4.1 to 4, 2.5% error). When dx=1, the rates are pretty different (5 to 4, 25% error).</p>

<p>Following the pattern, we see that throwing out the electrodes (letting dx=0) reveals the true rate of 2x.</p>

<p>In plain English: We analyzed how f(x) = x^2 changes, found an "imperfect" measurement of 2x + dx, and deduced a "perfect" model of change as 2x.</p>

<h2>The derivative as "continuous division"</h2>

<p>I see the integral as <a href="http://betterexplained.com/articles/a-calculus-analogy-integrals-as-multiplication/">better multiplication</a>, where you can apply a changing quantity to another.</p>

<p>The derivative is "better division", where you get the speed through the continuum at every instant. Something like 10/5 = 2 says "you have a constant speed of 2 through the continuum".</p>

<p>When your speed changes as you go, you need to describe your speed at each instant. That's the derivative.</p>

<p>If you apply this changing speed to each instant (take the integral of the derivative), you recreate the original behavior, just like applying the daily stock market changes to recreate the full price history. But this is a big topic for another day.</p>

<h2>Gotcha: The Many meanings of "Derivative"</h2>

<p>You'll see "derivative" in many contexts:</p>

<ul>
<li><p>"The derivative of x^2 is 2x" means "At every point, we are changing by a speed of 2x (twice the current x-position)". (General formula for change)</p></li>
<li><p>"The derivative is 44" means "At our current location, our rate of change is 44." When f(x) = x^2, at x=22 we're changing at 44 (Specific rate of change).</p></li>
<li><p>"The derivative is dx" may refer to the tiny, hypothetical jump to the next position. Technically, dx is the "differential" but the terms get mixed up. Sometimes people will say "derivative of x" and mean dx.</p></li>
</ul>

<h2>Gotcha: Our models may not be perfect</h2>

<p>We found the "perfect" model by making a measurement and improving it. Sometimes, this isn't good enough -- we're predicting what <em>would</em> happen if dx wasn't there, but added dx to get our initial guess!</p>

<p>Some ill-behaved functions defy the prediction: there's a difference between removing dx with the limit and what actually happens at that instant. These are called "discontinuous" functions, which is essentially "cannot be modeled with limits". As you can guess, the derivative doesn't work on them because we can't actually predict their behavior.</p>

<p>Discontinuous functions are rare in practice, and often exist as "Gotcha!" test questions ("Oh, you tried to take the derivative of a discontinuous function, you fail"). Realize the theoretical limitation of derivatives, and then realize their practical use in measuring every natural phenomena. Nearly every function you'll see (sine, cosine, e, polynomials, etc.) is continuous.</p>

<h2>Gotcha: Integration doesn't really exist</h2>

<p>The relationship between derivatives, integrals and anti-derivatives is nuanced (and I got it wrong originally). Here's a metaphor. Start with a plate, your function to examine:</p>

<ul>
<li>Differentiation is breaking the plate into shards. There is a specific procedure: take a difference, find a rate of change, then assume dx isn't there.</li>
<li>Integration is weighing the shards: your original function was "this" big. There's a procedure, cumulative addition, but it doesn't tell you <em>what the plate looked like</em>.</li>
<li>Anti-differentiation is figuring out the original shape of the plate from the pile of shards.</li>
</ul>

<p>There's no <em>algorithm</em> to find the anti-derivative; we have to guess. We make a lookup table with a bunch of known derivatives (original plate => pile of shards) and look at our existing pile to see if it's similar. "Let's find the integral of 10x. Well, it looks like 2x is the derivative of x^2. So... scribble scribble... 10x is the derivative of 5x^2.".</p>

<p>Finding derivatives is mechanics; finding anti-derivatives is an art. Sometimes we get stuck: we take the changes, apply them piece by piece, and mechanically reconstruct a pattern. It might not be the "real" original plate, but is good enough to work with.</p>

<p>Another subtlety: aren't the integral and anti-derivative the same? (That's what I originally thought)</p>

<p>Yes, but this isn't obvious: it's the fundamental theorem of calculus! (It's like saying "Aren't a^2 + b^2 and c^2 the same? Yes, but this isn't obvious: it's the Pythagorean theorem!"). Thanks to Joshua Zucker for helping sort me out.</p>

<h2>Reading math</h2>

<p>Math is a language, and I want to "read" calculus (not "recite" calculus, i.e. like we can recite medieval German hymns). I need the message behind the definitions.</p>

<p>My biggest aha! was realizing the transient role of dx: it makes a measurement, and is removed to make a perfect model. Limits/infinitesimals are a formalism, we can't get caught up in them. Newton seemed to do ok without them.</p>

<p>Armed with these analogies, other math questions become interesting:</p>

<ul>
<li>How do we measure different sizes of infinity? (In some sense they're all "infinite", in other senses the range (0,1) is smaller than (0,2))</li>
<li>What are the real rules about making "dx go away"? (How do infinitesimals and limits really work?)</li>
<li>How do we describe numbers without writing them down? "The next number after 0" is the beginnings of analysis (which I want to learn).</li>
</ul>

<p>The fundamentals are interesting when you see why they exist. Happy math.</p>
]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/calculus-building-intuition-for-the-derivative/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>Vector Calculus: Understanding the Dot Product</title>
		<link>http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/</link>
		<comments>http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/#comments</comments>
		<pubDate>Mon, 27 Feb 2012 15:00:31 +0000</pubDate>
		<dc:creator>kalid</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Vector Calculus]]></category>

		<guid isPermaLink="false">http://betterexplained.com/?p=1831</guid>
		<description><![CDATA[I see the dot product as directional multiplication. But multiplication goes beyond <a href="http://betterexplained.com/articles/rethinking-arithmetic-a-visual-guide/">repeated counting&#8230; <a href="http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/" class="read_more">Read article</a></a>: it&#8217;s applying the essence of one item to another.

Normal multiplication combines growth rates: &#8220;3 x 4&#8243; can mean &#8220;Take your 3x growth and make]]></description>
			<content:encoded><![CDATA[<p>I see the dot product as directional multiplication. But multiplication goes beyond <a href="http://betterexplained.com/articles/rethinking-arithmetic-a-visual-guide/">repeated counting</a>: it&#8217;s applying the essence of one item to another.</p>

<p>Normal multiplication combines growth rates: &#8220;3 x 4&#8243; can mean &#8220;Take your 3x growth and make it 4x larger (i.e., 12x)&#8221;. <a href="http://betterexplained.com/articles/understanding-why-complex-multiplication-works/">Complex multiplication</a> lets us combine rotations. <a href="http://betterexplained.com/articles/a-calculus-analogy-integrals-as-multiplication/">Integrals</a> let us do piece-by-piece multiplication.</p>

<p>A vector is &#8220;growth in a direction&#8221;. The dot product lets us apply the directional growth of one vector to another: the result is how much we went along the original path (positive progress, negative, or zero).</p>

<p>Today let&#8217;s build our intuition for how the dot product works.</p>

<h2>Getting the Formula Out of the Way</h2>

<p>You&#8217;ve seen the dot product equation everywhere:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/7093a2c642c17b4be8c2765fdc3d223f.png' title='\displaystyle{\vec{a} \cdot \vec{b} = a_x \cdot b_x + a_y \cdot b_y = |\vec{a}||\vec{b}|\cos(\theta) }' alt='\displaystyle{\vec{a} \cdot \vec{b} = a_x \cdot b_x + a_y \cdot b_y = |\vec{a}||\vec{b}|\cos(\theta) }' align=absmiddle class='tex'></p>

<p>And also the justification: &#8220;Well Billy, the Law of Cosines (you remember that, don&#8217;t you?) says the following calculations are the same, so they are.&#8221; Not good enough &#8212; it doesn&#8217;t click! Beyond the computation, what does it mean?</p>

<p>The goal is to apply one vector to another. Each computation examines this from a rectangular perspective (x- and y-coordinates) or a polar one (magnitudes and angles). The &#8220;blah = foo&#8221; equation above really means &#8220;Here&#8217;s two equivalent ways to &#8216;directionally multiply&#8217; vectors&#8221;.</p>

<p>(Similarly, we can show that <a href="http://betterexplained.com/articles/intuitive-understanding-of-eulers-formula/">Euler&#8217;s formula</a> (e^ix = cos(x) + i*sin(x)) is true because the Taylor series is the same on both sides. Accurate but unsatisfying! Instead, see how both sides can describe the same motion.)</p>

<h2>Seeing Numbers as vectors</h2>

<p>Let&#8217;s start simple, and see 3 x 4 as a dot product:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/69d0b6157d3c64105256895647620f90.png' title='\displaystyle{(3, 0) \cdot (4,0)}' alt='\displaystyle{(3, 0) \cdot (4,0)}' align=absmiddle class='tex'></p>

<p>The number 3 is &#8220;directional growth&#8221; in a single dimension (x-axis, let&#8217;s say), and 4 is &#8220;directional growth&#8221; in that same direction. 3 x 4 = 12 means 12x growth in that single dimension. Ok.</p>

<p>Now, suppose each number refers to a different dimension? Suppose 3 means &#8220;triple your bananas&#8221; (sigh&#8230; or &#8220;x-axis&#8221;) and 4 means &#8220;quadruple your oranges&#8221; (y-axis). They&#8217;re not the same &#8220;type&#8221; of number: what happens when we apply growth (take the dot product) in the (bananas, oranges) universe?</p>

<ul>
<li>(3,0) is &#8220;Triple your bananas, destroy your oranges&#8221;</li>
<li>(0,4) is &#8220;Destroy your bananas, quadruple your oranges&#8221;</li>
</ul>

<p>Applying (0,4) to (3,0) means &#8220;Destroy your banana growth, quadruple your orange growth&#8221;. But (3, 0) had no orange growth to begin with, so the net result is 0 (&#8220;Destroy all your fruit, buddy&#8221;).</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/d3c7038a36ee1fffede207579a40d786.png' title='\displaystyle{(3, 0) \cdot (0, 4) = 0}' alt='\displaystyle{(3, 0) \cdot (0, 4) = 0}' align=absmiddle class='tex'></p>

<p>See how we&#8217;re &#8220;applying&#8221; and not adding. With addition, we sort of smush the items together: (3,0) + (0, 4) = (3, 4) [a vector which triples your oranges <em>and</em> quadruples your bananas].</p>

<p>&#8220;Application&#8221; is different. We&#8217;re mutating the original vector according to the rules in the second. And the rules are &#8220;Destroy your banana growth <em>rate</em>, and triple your orange growth <em>rate</em>&#8220;. And, sadly, this leaves us with nothing.</p>

<p>The final result of this process can be:</p>

<ul>
<li>zero: we don&#8217;t have any growth in the original direction</li>
<li>positive number: we have some growth in the original direction</li>
<li>negative number: we have negative (reverse) growth in the original direction</li>
</ul>

<h2>Understanding the Calculation</h2>

<p>&#8220;Applying vectors&#8221; is still a bit abstract. I think &#8220;How much energy/push is one vector giving to the other?&#8221;. Here&#8217;s how I visualize it:</p>

<p><strong>Rectangular Coordinates: Component-by-component overlap</strong></p>

<p>Like multiplying complex numbers, see how each x- and y-component interacts:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/dotproduct/dot_product_components.png" alt="Dot Product Components" /></p>

<p>We list out all four combinations (x-x, y-x, x-y, y-y). Since the x- and y-coordinates don&#8217;t affect each other (like holding a bucket sideways under a waterfall &#8212; nothing falls in), the total energy absorbtion is absorbtion(x) + absorbtion(y):</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/fba1cc0231e9178d6a8fc9db46f5fad5.png' title='\displaystyle{\vec{a} \cdot \vec{b} = a_x \cdot b_x + a_y \cdot b_y}' alt='\displaystyle{\vec{a} \cdot \vec{b} = a_x \cdot b_x + a_y \cdot b_y}' align=absmiddle class='tex'></p>

<p><strong>Polar coordinates: Projection</strong></p>

<p>The word &#8220;projection&#8221; is so sterile: I prefer &#8220;along the path&#8221;. How much energy is actually going in our original direction?</p>

<p>Here&#8217;s one way to see it: </p>

<p><img src="http://betterexplained.com/wp-content/uploads/dotproduct/dot_product_rotation.png" alt="Dot Product Rotation" /></p>

<p>Take two vectors, a and b. Rotate our coordinates so b is horizontal: it becomes (|b|, 0), and everything is on this new x-axis. What&#8217;s the dot product now? (It shouldn&#8217;t change just because we tilted our head).</p>

<p>Well, vector a has new coordinates (a1, a2), and we get:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/54426ff2bcbd0c0c19cac25e8eb8b5a1.png' title='\displaystyle{a1 \cdot |\vec{b}| + a2 \cdot 0 = a1 \cdot |\vec{b}|}' alt='\displaystyle{a1 \cdot |\vec{b}| + a2 \cdot 0 = a1 \cdot |\vec{b}|}' align=absmiddle class='tex'></p>

<p>a1 is really &#8220;What is the x-coordinate of a, assuming b is the x-axis?&#8221;. That is |a|cos(&#952;), aka the &#8220;projection&#8221;:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/589231b1e74b52bba5238e289552840c.png' title='\displaystyle{\vec{a} \cdot \vec{b} = |\vec{a}|\cos(\theta)|\vec{b}|}' alt='\displaystyle{\vec{a} \cdot \vec{b} = |\vec{a}|\cos(\theta)|\vec{b}|}' align=absmiddle class='tex'></p>

<h2>Analogies for the Dot Product</h2>

<p>The common interpretation is &#8220;geometric projection&#8221;, but it&#8217;s so sterile. Here&#8217;s some analogies that click for me:</p>

<p><strong>Energy Absorbtion</strong></p>

<p>One vector are solar rays, the other is where the solar panel is pointing (yes, yes, the normal vector). Larger numbers mean stronger rays or a larger panel. How much energy is absorbed?</p>

<ul>
<li>Energy =  Overlap in direction * Strength of rays * Size of panel</li>
<li><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/4ee8bf2bcf46916dd0d58c8223707f7a.png' title='\displaystyle{Energy = \cos(\theta) \cdot |a| \cdot |b|}' alt='\displaystyle{Energy = \cos(\theta) \cdot |a| \cdot |b|}' align=absmiddle class='tex'></li>
</ul>

<p>If you hold your panel sideways to the sun, no rays hit (cos(&#952;) = 0).</p>

<p><img src="http://betterexplained.com/wp-content/uploads/dotproduct/Solar_Panel.png" alt="Solar Panel Dot Product" />
<a href="http://www.flickr.com/photos/knowmybackyard/2394376192/">Photo credit</a></p>

<p>But&#8230; but&#8230; solar rays are leaving the sun, and the panel is facing the sun, and the dot product is negative when vectors are opposed! Take a deep breath, and remember the goal is to embrace the analogy (besides, physicists lose track of negative signs all the time).</p>

<p><strong>Mario-Kart Speed Boost</strong></p>

<p>In Mario Kart, there are &#8220;boost pads&#8221; on the ground that increase your speed (Never played? I&#8217;m sorry.)</p>

<p><img src="http://betterexplained.com/wp-content/uploads/dotproduct/mario_kart_vector.png" alt="Solar Panel Dot Product" />
<a href="http://www.mariokartwii.com/f72/official-mario-kart-wii-model-hacking-new-39114-409.html">Photo source</a></p>

<p>Imagine the red vector is your speed (x and y direction), and the blue vector is the orientation of the boost pad (x and y direction). Larger numbers are more power.</p>

<p>How much boost will you get? For the analogy, imagine the pad multiplies your speed:</p>

<ul>
<li>If you come in going 0, you&#8217;ll get nothing</li>
<li>If you cross the pad perpendicularly, you&#8217;ll get 0 [just like the banana obliteration, it will give you 0x boost in the perpendicular direction]</li>
</ul>

<p>But, if we have some overlap, our x-speed will get an x-boost, and our y-speed gets a y-boost:</p>

<p><img src='http://74.50.62.72/wp-content/plugins/wp-latexrender/pictures/98dd3e634c7022901a8b70f689859752.png' title='\displaystyle{Total = speed_x \cdot boost_x + speed_y \cdot boost_y}' alt='\displaystyle{Total = speed_x \cdot boost_x + speed_y \cdot boost_y}' align=absmiddle class='tex'></p>

<p>Neat, eh? Another way to see it: your incoming speed is |a|, and the max boost is |b|. The amount of boost you actually get (for being lined up with it) is cos(&#952;), for the total |a||b|cos(&#952;).</p>

<p><strong>Physics Physics Physics</strong></p>

<p>The dot product appears all over physics: some field (electric, gravitational) is pulling on some particle. We&#8217;d love to multiply, and we could if everything were lined up. But that&#8217;s never the case, so we take the dot product to account for potential differences in direction.</p>

<p>It&#8217;s all a useful generalization: Integrals are &#8220;multiplication, taking changes into account&#8221; and the dot product is &#8220;multiplication, taking direction into account&#8221;.</p>

<p>And what if your direction is changing? Why, take the <a href="http://en.wikipedia.org/wiki/Line_integral">integral of the dot product</a>, of course!</p>

<h2>Onward and Upward</h2>

<p>Don&#8217;t settle for &#8220;Dot product is the geometric projection, justified by the law of cosines&#8221;. Find the analogies that click for you! Happy math.</p>
]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Using Logarithms in the Real World</title>
		<link>http://betterexplained.com/articles/using-logs-in-the-real-world/</link>
		<comments>http://betterexplained.com/articles/using-logs-in-the-real-world/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 14:00:41 +0000</pubDate>
		<dc:creator>kalid</dc:creator>
				<category><![CDATA[Math]]></category>

		<guid isPermaLink="false">http://betterexplained.com/?p=1782</guid>
		<description><![CDATA[Logarithms are everywhere. Ever use any of the following phrases?

<ul>
<li>6 figures</li>
<li>Double digits</li>
<li>Order of magnitude</li>
&#8230; <a href="http://betterexplained.com/articles/using-logs-in-the-real-world/" class="read_more">Read article</a></ul>

You&#8217;re describing numbers in terms of their powers of 10 &#8212; a logarithm. Ever mention an interest rate or rate of return? It&#8217;s]]></description>
			<content:encoded><![CDATA[<p>Logarithms are everywhere. Ever use any of the following phrases?</p>

<ul>
<li>6 figures</li>
<li>Double digits</li>
<li>Order of magnitude</li>
</ul>

<p>You&#8217;re describing numbers in terms of their powers of 10 &#8212; a logarithm. Ever mention an interest rate or rate of return? It&#8217;s the logarithm of your growth.</p>

<p>Surprised that logarithms are so common? Me too. Many attempts at Math In the Real World are attempts to point out logarithms in some arcane formula or pretending we&#8217;re geologists fascinated by the Richter Scale. &#8220;Scientists care about logs, and you should too. Also, can you imagine a <a href="http://www.snpp.com/episodes/8F16.html">world without zinc</a>?&#8221;</p>

<p>No, no, no, no no, no no! (Mama mia!)</p>

<p>Math expresses concepts with notation like &#8220;ln&#8221; or &#8220;log&#8221;. Finding &#8220;math in the real world&#8221; means encountering ideas in life and seeing how they <em>could</em> be written with notation. Don&#8217;t look for the literal symbols! When was the last time you wrote a division sign? When was the last time you chopped up some food?</p>

<h2>Ok, ok, we get it: what are logarithms about?</h2>

<p><strong>Logarithms find the cause for an effect, i.e the input for some output</strong></p>

<p>A common &#8220;effect&#8221; is seeing something grow, like going from $100 to $150 in 5 years. How did this happen? We&#8217;re not sure, but the logarithm finds a possible cause: A continuous return of ln(150/100) / 5 = 8.1% would account for that change. It might not be the actual cause (did all the growth happen in the final year?), but it&#8217;s a smooth average we can compare to other changes.</p>

<p>By the way, the notion of &#8220;cause and effect&#8221; is nuanced. Why is 1000 bigger than 100?</p>

<ul>
<li>100 is 10 which grew by itself for 2 time periods (10 * 10)</li>
<li>1000 is 10 which grew by itself for 3 time periods (10 * 10 * 10)</li>
</ul>

<p>We can think of numbers as outputs (1000 is &#8220;1000 outputs&#8221;) and inputs (&#8220;How many times does 10 need to grow to make those outputs?&#8221;). So,</p>

<p>1000 outputs > 100 outputs</p>

<p>because</p>

<p>3 inputs > 2 inputs [i.e., because log(1000) > log(100)]</p>

<p>Why is this useful?</p>

<p><strong>Logarithms put numbers on a human-friendly scale.</strong></p>

<p>Large numbers break our brains. Millions and trillions are &#8220;really big&#8221; even though a million seconds is 12 days and a trillion seconds is 30,000 years. It&#8217;s the difference between an American vacation year and the entirety of human civilization.</p>

<p>The trick to overcoming &#8220;huge number blindness&#8221; is to write numbers in terms of &#8220;inputs&#8221; (i.e. their power base 10). This smaller scale (0 to 100) is much easier to grasp:</p>

<ul>
<li>power of 0 = 10^0 = 1 (single item)</li>
<li>power of 1 = 10^1 = 10</li>
<li>power of 3 = 10^3 = thousand</li>
<li>power of 6 = 10^6 = million</li>
<li>power of 9 = 10^9 = billion</li>
<li>power of 12 = 10^12 = trillion</li>
<li>power of 23 = 10^23 = number of molecules in a dozen grams of carbon</li>
<li>power of 80 = 10^80 = number of molecules in the universe</li>
</ul>

<p>A 0 to 80 scale took us from a single item to the number of things in the universe. Not too shabby.</p>

<p><strong>Logarithms count multiplication as steps</strong></p>

<p>Logarithms describe changes in terms of multiplication: in the examples above, each step is 10x bigger. With the natural log, each step is &#8220;e&#8221; (2.71828&#8230;) times more.</p>

<p>When dealing with a series of multiplications, logarithms help &#8220;count&#8221; them, just like addition counts for us when effects are added.</p>

<h2>Show me the math</h2>

<p>Time for the meat: let&#8217;s see where logarithms show up!</p>

<p><strong>Six-figure salary or 2-digit expense</strong></p>

<p>We&#8217;re describing numbers in terms of their digits, i.e. how many powers of 10 they have (are they in the tens, hundreds, thousands, ten-thousands, etc.). Adding a digit means &#8220;multiplying by 10&#8243;, i.e.</p>

<p>1 [1 digit] * 10 * 10 * 10 * 10 * 10 [5 more digits] = 10^5 = 100,000</p>

<p>Logarithms count the number of multiplications <em>added on</em>, so starting with 1 (a single digit) we add 5 more digits (10^5) and 100,000 get a 6-figure result. Talking about &#8220;6&#8243; instead of &#8220;One hundred thousand&#8221; is the essence of logarithms. It gives a rough sense of scale without jumping into details.</p>

<p>Bonus question: How would you describe 500,000? Saying &#8220;6 figure&#8221; is misleading because 6-figures often implies something closer to 100,000. Would &#8220;6.5 figure&#8221; work?</p>

<p>Not really. In our heads, 6.5 means &#8220;halfway&#8221; between 6 and 7 figures, but that&#8217;s an adder&#8217;s mindset. With logarithms a &#8220;.5&#8243; means halfway in terms of multiplication, i.e the square root (9^.5 means the square root of 9 &#8212; 3 is halfway in terms of multiplication because it&#8217;s 1 to 3 and 3 to 9).</p>

<p>Taking log(500,000) we get 5.7, add 1 for the extra digit, and we can say &#8220;500,000 is a 6.7 figure number&#8221;. Try it out here:</p>

<iframe src="http://new.instacalc.com/895/embed" frameborder="0" marginwidth="0" marginheight="0" width="450" height="250"></iframe>

<p><strong>Order of magnitude</strong></p>

<p>We geeks love this phrase. It means roughly &#8220;10x difference&#8221; but just sounds cooler than &#8220;1 digit larger&#8221;.</p>

<p>In computers, where everything is counted with bits (1 or 0), each bit has a doubling effect (not 10x). So going from 8 to 16 bits is &#8220;8 orders of magnitude&#8221; or 2^8 = 256 times larger. (These bit sizes refers to the amount of memory available, not the processor speed). Going from 16 to 32 bits means 16 orders of magnitude, or 2^16 ~ 65,536 times larger.</p>

<p>Isn&#8217;t &#8220;16 extra bits of memory&#8221; better than &#8220;65,536 times more memory?&#8221;.</p>

<p><strong>Interest Rates</strong></p>

<p>How do we figure out growth rates? A country doesn&#8217;t intend to grow at 8.56% per year. You look at the GDP one year and the GDP the next, and take the logarithm to find the <em>implicit</em> growth rate.</p>

<p>My two favorite interpretations of the natural logarithm (ln(x)), i.e. the natural log of 1.5:</p>

<ul>
<li>Assuming 100% growth, how long do you need to grow to get to 1.5? (.405, less than half the time period)</li>
<li>Assuming 1 unit of time, how fast do you need to grow to get to 1.5? (40.5% per year, continuously compounded)</li>
</ul>

<p>Logarithms are how we figure out how fast we&#8217;re growing.</p>

<p><strong>Measurement Scale: Google PageRank</strong></p>

<p>Google gives every page on the web a score (PageRank) which is a rough measure of authority / importance. This is a logarithmic scale, which in my head means &#8220;PageRank counts the number of digits in your score&#8221;.</p>

<p>So, a site with pagerank 2 (&#8220;2 digits&#8221;) is 10x more popular than a PageRank 1 site. My site is PageRank 5 and CNN has PageRank 9, so there&#8217;s a difference of 4 orders of magnitude (10^4 = 10,000).</p>

<p>Roughly speaking, I get about 7000 visits / day. Using my envelope math, I can guess CNN gets about 7000 * 10,000 = 70 million visits / day. (How&#8217;d I do that? In my head, I think 7k * 10k = 70 * k * k = 70 * M). They might have a few times more than that (100M, 200M) but probably not up to 700M.</p>

<p>Google conveys a lot of information with a very rough scale (1-10).</p>

<p><strong>Measurement Scale: Richter, Decibel, etc.</strong></p>

<p>Sigh. We&#8217;re at the typical &#8220;logarithms in the real world&#8221; example: Richter scale and Decibel. The idea is to put events which can vary drastically (earthquakes) on a single 1 &#8211; 10 scale. Just like PageRank, each 1-point increase is a 10x improvement in power.</p>

<p>Decibels are similar, though it can be negative. Sounds can go from intensely quiet (pindrop) to extremely loud (airplane) and our brains can process it all. In reality, the sound of an airplane&#8217;s engine is millions (billions, trillions) of times more powerful than a pindrop, and it&#8217;s inconvenient to have a scale that goes from 1 to a gazillion. Logs keep everything on a reasonable scale.</p>

<p><strong>Logarithmic Graphs</strong></p>

<p>You&#8217;ll often see items plotted on a &#8220;log scale&#8221;. In my head, this means one side is counting &#8220;number of digits&#8221; or &#8220;number of multiplications&#8221;, not the value itself. Again, this helps show wildly varying events on a single scale (going from 1 to 10, not 1 to billions).</p>

<p>Moore&#8217;s law is a great example: we double the number of transistors every 18 months (image courtesy <a href="http://en.wikipedia.org/wiki/File:Transistor_Count_and_Moore%27s_Law_-_2011.svg">Wikipedia</a>).</p>

<p><img src="http://betterexplained.com/wp-content/uploads/logs/moores_law.png" alt="Moore's Law" /></p>

<p>The neat thing about log-scale graphs is exponential changes (processor speed) appear as a straight line. Growing 10x per year means you&#8217;re steadily marching up the &#8220;digits&#8221; scale.</p>

<h2>Onward and upward</h2>

<p>If a concept is well-known but not well-loved, it means we need to build our intuition. Find the analogies that work, and don&#8217;t settle for the slop a textbook will trot out. In my head:</p>

<ul>
<li>Logarithms find the root cause for an effect (see growth, find interest rate)</li>
<li>They help count multiplications or digits, with the bonus of partial counts (500k is a 6.7 digit number)</li>
</ul>

<p>Happy math.</p>
]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/using-logs-in-the-real-world/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>

