In math, we can get misleading intuitions about what can (or can't) be rearranged.

After learning addition, we've memorized facts like 2 + 4 = 6. But this might stray into the idea that "whenever I see 2 and 4, I can simplify to 6".

Although 2 + 4 = 6, but "baked(2) + baked(4)" is not "baked(6)". Baking unmixed ingredients in the exponential oven we get:

$2^2 + 4^2 \neq 6^2$

We can only confidently say:

$(2 + 4)^2 = 6^2$

We combine the ingredients, *then* bake the result. Exponents, like baking an apple pie, modify the original ingredients so they can't be easily combined later. While we might *recognize* the original 2 and 4, they aren't directly available. Two baked pies can't be smashed together to consolidate the filling.

This confusion gummed me up in calculus, when learning derivatives (the bad boy of baking).

In algebra, we internalize rules like:

But our intuition leads us astray when we get to the derivative.

because

Raw polynomials can be multiplied, but the *derivatives* of multiplied polynomials can't be rearranged so easily. Multiplication makes functions interact in a way that makes taking the derivative more complex:

Working through the Product Rule we get:

When learning Calculus, I was confused how standard interactions (like multiplication) needed special handling. I thought I was done learning new rules for "arithmetic".

But no: functions, when multiplied, interact in funky ways. See how each side grows its own sliver of area (`df * g`

and `dg * f`

)? The functions being multiplied are "baked together" and the overall effect depends on them both, simultaneously. We can't examine them in isolation (e.g., `df`

or `dg`

by itself).

Now, there are setups when the inputs *can* be processed separately and combined later (linear algebra). The cooking equivalent might be a smoothie: An apple/banana smoothie mixed with a peach/mango smoothie is the same as blending all ingredients in the beginning.

A common assumption is that operations are usually linear, but $\sin(a + b) \not= \sin(a) + \sin(b)$ and $(a + b)^2 \not= a^2 + b^2$. Sorry, we have to carefully cook the ingredients if we want the math to taste right.

When our intuition for a math rule doesn't make sense, ask "Are we making a pie, or a smoothie?"

]]>There's a math analogy here. Take a function, pick a specific point, and dive in. You can pull out enough data from a single point to rebuild the entire function. Whoa. It's like remaking a movie from a single frame.

The Taylor Series discovers the "math DNA" behind a function and lets us rebuild it from a single data point. Let's see how it works.

Given a function like $f(x) = x^2$, what can we discover at a single location?

Normally we'd expect to calculate a single value, like $f(4) = 16$. But there's much more beneath the surface:

- $f(x)$ = Value of function at point $x$
- $f'(x)$ = First derivative, or how fast the function is changing (the velocity)
- $f''(x)$ = Second derivative, or how fast the
*changes*are changing (the acceleration) - $f'''(x)$ = Third derivative, or how fast the
*changes*in the changes are changing (acceleration of the acceleration) - And so on

Investigating a single point reveals multiple, possibly infinite, bits of information about the behavior. (Some functions have an endless amount of data (derivatives) at a single point).

So, given all this information, what should we do? Regrow the organism from a single cell, of course! (*Maniacal cackle here.*)

Our plan is to grow a function from a single starting point. But how can we describe any function in a generic way?

The big aha moment: imagine any function, at its core, is a polynomial (with possibly infinite terms):

To rebuild our function, we start at a fixed point ($c_0$) and add in a bunch of other terms based on the value we feed it (like $c_1x$). The "DNA" is the values $c_0, c_1, c_2, c_3$ that describe our function exactly.

Ok, we have a generic "function format". But how do we find the coefficients for a specific function like sin(x) (height of angle x on the unit circle)? How do we pull out its DNA?

Time for the magic of 0.

Let's start by plugging in the function value at $x=0$. Doing this, we get:

Every term vanishes except $c_0$, which makes sense: the starting point of our blueprint should be $f(0)$. For $f(x) = \sin(x)$, we can work out $c_0 = \sin(0) = 0$. We have our first bit of DNA!

Now that we know $c_0$, how do we isolate $c_1$ in this equation?

Hrm. A few ideas:

Can we set $x = 1$? That gives $f(1) = c_0 + c_1(1) + c_2(1^2) + c_3(1^3) + \cdots$ . Although we know $c_0$, the other constants are summed together. We can't pull out $c_1$ by itself.

What if we divide by $x$? This gives:

Then we can set $x=0$ to make the other terms disappear... right? It's a nice idea, except we're now dividing by zero.

Hrm. This approach is really close. How can we *almost* divide by zero? Using the derivative!

If we take the derivative of the blueprint of $f(x)$, we get:

Every power gets reduced by 1 and the $c_0$, a constant value, becomes zero. It's almost too convenient.

Now we can isolate $c_1$ using our $x=0$ trick:

In our example, $\sin'(x) = \cos(x)$ so in our example: $f'(0) = \sin'(0) = \cos(0) = 1 = c_1$

Yay, one more bit of DNA! This is the magic of the Taylor series: by repeatedly applying the derivative and setting $x = 0$, we can pull out the polynomial DNA.

Let's try another round:

After taking the second derivative, the powers are reduced again. The first two terms ($c_0$ and $c_1x$) disappear, and we can again isolate $c_2$ by setting $x=0$:

For our sine example, $\sin'' = -\sin$, so:

or $c_2 = 0$.

As we keep taking derivatives, we're performing more multiplications and growing a factorial in front of each term (1!, 2!, 3!).

**The Taylor Series for a function around point x=0 is:**

(Technically, the Taylor series around the point $x=0$ is called the MacLaurin series.)

**The generalized Taylor series, extracted from any point a is:**

The idea is the same. Instead of our regular blueprint, we use:

Since we're growing from $f(a)$, we can see that $f(a) = c_0 + 0 + 0 + \dots = c_0$. The other coefficients can be extracted by taking derivatives and setting $x = a$ (instead of $x =0$).

Plugging in derivatives into the formula above, here's the Taylor series of $\sin(x)$ around $x = 0$:

And here's what that looks like:

A few notes:

**1) Sine has infinite terms**

Sine is an infinite wave, and as you can guess, needs an infinite number of terms to keep it going. Simpler functions (like $f(x) = x^2 + 3$) are already in their "polynomial format" and don't have infinite derivatives to keep the DNA going.

**2) Sine is missing every other term**

If we repeatedly take the derivative of sine at x = 0 we get:

with values:

Ignoring the division by the factorial, we get the pattern:

So the DNA of sine is something like [0, 1, 0, -1] repeating.

**3) Different starting positions have different DNA**

For fun, here's the Taylor series of $\sin(x)$ starting at $x =\pi$ (link):

A few notes:

The DNA is now something like [0, -1, 0, 1]. The cycle is similar, but the starting value has changed since we're starting at $x=\pi$.

Written as calculated numbers, the denominators 1, 6, 120, 5040 look strange. But they're just every other factorial: 1! = 1, 3! = 6, 5! =120, 7! = 5040. In general, the Taylor series can have gnarly denominators.

- The $O(x^{12})$ term means there are other components of order (power) $x^{12}$ and higher. Because $\sin(x)$ has infinite derivatives, we have infinite terms and the computer has to cut us off somewhere. (
*You've had enough Tayloring for today, buddy.*)

A popular use of Taylor series is getting a quick approximation for a function. If you want a tadpole, do you need the DNA for the entire frog?

The Taylor series has a bunch of terms, typically ordered by importance:

- $c_0 = f(0)$, the constant term, is the exact value at the point
- $c_1 = f'(0)x$, the linear term, tells us what speed to move from our point
- $c_2= \frac{f''(0)}{2!}x^2 $, the quadratic term, tells us how much to accelerate away from our point
- and so on

If we only need a prediction for a few instants around our point, the initial position & velocity may be good enough:

If we're tracking for longer, then acceleration becomes important:

As we get further from our starting point, we need more terms to keep our prediction accurate. For example, the linear model $\sin(x) = x$ is a good prediction around $x=0$. As we get further out, we need to account for more terms.

Similarly, $e^x \sim 1 + x$ works well for small interest rates: 1% discrete interest is 1.01 after one time period, 1% continuous interest is a tad higher than 1.01. As time goes on, the linear model falls behind because it ignores the compounding effects.

What's a common application of DNA? Paternity tests.

If we have a few functions, we can compare their Taylor series to see if they're related.

Here's the expansions of $\sin(x)$, $\cos(x)$, and $e^x$:

There's a family resemblence in the sequences, right? Clean powers of $x$ divided by a factorial?

One problem is the sequence for $e^x$ has positive terms, while sine and cosine alternate signs. How can we link these together?

Euler's great insight was realizing an imaginary number could swap the sign from positive to negative:

Whoa. Using an imaginary exponent and separating into odd/even powers reveals that sine and cosine are hiding inside the exponential function. Amazing.

Although this proof of Euler's Formula doesn't show *why* the imaginary number makes sense, it reveals the baby daddy hiding backstage.

**Relationship to Fourier Series**

The Taylor Series extracts the "polynomial DNA" and the Fourier Series/Transform extracts the "circular DNA" of a function. Both see functions as built from smaller parts (polynomials or exponential paths).

**Does the Taylor Series always work?**

This gets into mathematical analysis beyond my depth, but certain functions aren't easily (or ever) approximated with polynomials.

Notice that powers like $x^2, x^3$ explode as $x$ grows. In order to have a slow, gradual curve, you need an army of polynomial terms fighting it out, with one winner barely emerging. If you stop the train too early, the approximation explodes again.

For example, here's the Taylor Series for $\ln(1 + x)$. The black line is the curve we want, and adding more terms, even dozens, barely gets us accuracy beyond $x=1.0$. It's just too hard to maintain a gentle slope with terms that want to run hog wild.

In this case, we only have a radius of convergence where the approximation stays accurate (such as around $|x| < 1$).

**Turning geometric to algebraic definitions**

Sine is often defined geometrically: the height of a line on a circular figure.

Turning this into an equation seems really hard. The Taylor Series gives us a process: If we know a single value and how it changes (the derivative), we can reverse-engineer the DNA.

Similarly, the description of $e^x$ as "the function with its derivative equal to the current value" yields the DNA [1, 1, 1, 1], and polynomial $f(x) = 1 + \frac{1}{1!}x + \frac{1}{2!}x^2 + \frac{1}{3!}x^3 + \dots $. We went from a verbal description to an equation.

Phew! A few items to ponder.

Happy math.

]]>Like the word "run", the meaning depends on context:

- crawl / walk / run (movement)
- run a company (general operation)
- a run of good luck (sequence)
- and a dozen more definitions

Sticking with a single interpretation of "run" leads to confusion, and the same happens in math. Let's clarify how exponents are used.

We first learn that exponents like $3^2$ or $a^n$ are repeated multiplication: multiply $a$, $n$ times.

Like counting on your fingers, this breaks down beyond the positive integers. What does a fractional exponent mean? A negative one? Zero? (Since $a^0 = 1$, we multiply zero times and get 1?)

*Common usage of $a^n$*: Counting problems. If you flip a coin $n$ times, you have $2^n$ possible outcomes.

Let's say I have an exponent like $3^{4.5}$. I mentally convert it to $1.0 * 3^{4.5}$, and then $1.0 * g^{t}$.

With the "growth microwave" analogy, an exponent grows our starting amount (1.0) by $g$ for $t$ units of time. (In this example, 3x growth applied for for 4.5 seconds.)

What values can $t$ have?

- If t is positive, we go forward in time and get larger (assuming $g > 1$). Fractional time is ok -- I can run a microwave for 3.5 minutes, and get some effect between 3 minutes and 4 minutes.
- If t is negative, we go backwards in time and get smaller. If a regular microwave allowed negative time, it would cool down your food, right?
- If t is zero, we didn't use the machine at all! We're left with 1.0, our original amount.

The growth microwave interpretation helps with fractional powers (and resolves the t=0 issue), but it's not *flexible*. Doubling the rate and halving the time doesn't have the result we expect:

2 seconds of 3x growth isn't the same as 1 second of 6x growth. Ugh. I'm not a caveman, we need to mix rate and time! (Hold onto that thought.)

*Common usage of $g^t$*: Man-made systems. If I agree to pay you 15% at a certain discrete interval (yearly), we can model the outcome as $(1.15)^t$. If I decide to cut the payments short (2.5 years) we can exponentially interpolate between the two intervals ($r^{0.5}$ is the square root). We often set $g = (1 + r)$, so we could write $(1 + .15)^t$.

Aside: Let's prove $g^t$ isn't flexible.

However, this shows the special case of $2^2 = 4^1$ does work.

Regular readers know I think of e as a continuous growth engine:

Instead of waiting to grow at discrete intervals, we apply interest immediately and compound as fast as we can. A pleasant consequence of e's definition is that we merge rate and time into a single, interchangeable quantity:

Conveniently, 2 years of 50% growth is the same as 1 year of 100% growth. We doubled our rate, halved our time, and got the same result. (Practically, we may prefer the shorter time period but the final quantity is the same.)

The input $x$ is the "growth fuel" that can be separated into "rate * time". The base, e, is a machine that just cares how much fuel you gave it. Drip the fuel over 50 time periods, or firehose it into a single one. Either way, the same total input $x$ gets the same final result.

*Common usage of $e^{rt}$*: Natural systems. Most laws of physics have continuous growth patterns (no delay between earning interest and using it). We may occasionally use the man-made version for our convenience, e.g., describing a radioactive half life of 20 years, even though the atoms are decaying on an instant-by-instant basis.

(Aside: Use the natural log to convert one exponent format to another. $g^t = e^{\ln(g)t}$)

We can treat $e^x$ as a fancy mathematical function:

You may see $e^x$ written $\exp(x)$, treated like any other function $f(x)$. Here, $x$ is just a numerical input to an intricate power series. Concepts like repeated counting, growth rate, and time fall into the background (though we can see them if we look).

Curiously, we're left with integer powers ($x^0, x^1, x^2, x^3$) and our "repeated multiplication" interpretation shows up again! The power of the exponent, $x$, switches from the number of multiplications to the base being multiplied. (The *ciiiircle* of life.)

*Common usage of $\exp(x)$*: When we see $e^x$ as just another function, a few properties emerge:

- Using calculus with exponents gets way easier, since we can take the derivative / integral of each term (and realize $\frac{d}{dx} e^x = e^x$).
- Exponential approximations become easy: $e^x \sim 1 + x$ for small values of $x$, since the higher-order powers become negligible.
- Other math patterns click. Sine and cosine have expansions similar to $e^x$, hinting that trig functions and exponents are connected (Euler's Formula).
- $e^x$ looks like a polynomial of infinite degree, and will eventually surpass any finite polynomial. (While $x^2 + 100 > e^x$ in the beginning, $e^x$ will eventually exceed it.)

You probably guessed it: it depends, though the interpretations are listed from most to least common for a general audience.

If a formula doesn't make sense, try switching versions. Life's too short to have only a single interpretation of exponents.

Happy math.

]]>Using $f(ax + b) + c$ instead of $f(x)$ has a few effects:

- a seems to squash the function
- b slides us left/right
- c lifts us up/down
- Interactive example on Desmos

What's going on? Well, we're describing the visual result on the graph, but aren't describing that underlying process that made the change. Let's take a look at the root cause.

The "a" in f(ax) is a fast-forward factor. Normally, we experience time as "1 minute per minute" -- for every minute we wait, the world advances one minute into the future.

What if we saw life on fast forward?

Here, 1 minute passing to us (in our "x" timeline) means 2 minutes passed in the real world. Or 10 minutes, an hour, or a year.

On our timeline (x), time passes as normal. But our function, which determines the results we see, is being fed a modified timeline. While we leisurely stroll from x=1 to x=2, f has to jump from f(1) to f(2) to f(3) ... up to f(20). Here, f needs to graph 10 minutes of events while we casually waited one minute. Cramming more data points into the same time period is a squashed, sped-up graph.

Intuitively, "Squashing the graph" really means "running time faster".

A simpler version of time travel is a basic shift. If things happen *ahead* of schedule, what does that mean?

Imagine a German/Japanese city where the trains run an hour ahead of schedule.

The 4pm train arrives at 3pm. The 11am flight takes off at 10am. In other words, a dystopian hellscape.

If I managed not to burst into flames upon arriving, I'd describe the situation like this:

- Actual time: 10am
- Flight that leaves: 11am
- Flight = actual + 1

In other words, f(x + 1) means things run ahead of schedule. We think it's 3pm (x = 3), but the 4pm events are happening [f(4) is happening].

Again, this can be tricky: doesn't it seem like we remove time to make things happen earlier? This is our visual intuition fooling us: if it's 3pm but f is running the 4pm events, it's going ahead of schedule. Note that we *add* time to our watch in order to arrive early.)

See "slide to the left" as "ahead of schedule".

Adding a value to a function moves it vertically.

What's happening? Unbeknownst to f(x), we take take the final result and make it larger.

Intuitively, we have a "bias". When f(x) = 0, it's telling us "don't change, stay at 0". Except our default value isn't zero, it's `c`

. When f(x) says don't change, that means "use your default value, c".

See "sliding up the function" as "changing the default value".

(In neural networks, you might have a default value if there's zero input. A "default bias" is a nice way to describe this, vs. "vertically sliding the function".)

In Calculus, the chain rule lets us compose functions. (Fancy phrase for cramming one function inside another.)

When we cram 2x inside of sin, and take the derivative, we get:

The chain rule tells us to take the derivative of the outer function (sin(2x) => cos(2x)) and multiply by the derivative of the inner function (2x => 2).

What's going on? Using the "derivative = slope" interpretation (not my favorite but good for graphing), we see this:

If we pick a point on the cycle (such as x = 1 radian), we find the slope there as

In other words, at x = 1, sin(x) has a nice upward slope of about 54%. Ok, great.

Now, what happens if we run sin(x) at twice the speed? Eat, eat, eat!

Well... the derivative (slope we see) should double! Compared to the original, sin(2x) runs through changes twice as fast as we do. 1 minute in our world means sin(2x) has chomped through 2 minutes of changes.

At the *corresponding point* in the cycle, we should expect double the slope. Let's make sure.

Instead of asking for x = 1, we know that same point on sin(2x)'s timeline is now x = 0.5. So let's ask for the derivative there:

Yay, the math worked!

We can mechanically describe the chain rule as a way to compose functions. But intuitively, we've strapped a fast-forward device inside our function, speeding up the changes we experience.

Regular sin(x) keeps its derivative between -1.0 and +1.0, the limits imposed by cos(x). But now we have a fast-forward trick to make sin(x) move as fast as we want.

Extra: Imagine we weren't fast-forwarding at a constant rate of 2; what if sped up more, the further along we went? That function could be sin(x * x), where one x is our regular location in the cycle, and the other x is our fast-forward rate. For fun (yes!), you can take the derivative of sin(x * x) (answer).

We can run the exponential function ($e^x$) ahead of schedule with $e^{x + b}$.

But we can rewrite this to:

In other words, running the exponential function "ahead of schedule" can be seen as the regular exponential function with a bigger starting point.

Normal exponentials start at 1.0 and begin compounding continuously. Instead, we can see it as starting with a bigger starting value from 1.0. (For example, $e^{x + 2}$ starts compounding from $e^2 = 7.389$.)

Depending on the function, interpretations other than "ahead of schedule" might make more sense.

Happy math.

]]>