The standard quadratic formula is a lot to remember:

It's a maze of numbers, letters, and square roots. It's derived from "completing the square" on a general quadratic equation ($ax^2 + bx + c = 0$). There are several good explanations for the standard formula, here's my intuition for a variation.

Here's our typical starting equation:

First off, why leave $a$ hanging around? Divide that fella out, and get:

In fact, pretend $a$ was never there. You'd combine similar terms ($3x + 4x = 7x$) before doing any other work, right? In a similar way, demand that $x^2$ appear by itself (with no coefficients) before you begin solving.

After dividing any coefficients attached to $x^2$, our equation is in the format:

Ah, that's a better starting point.

(Note: $b$ and $c$ are what we label the coefficients after all simplifications. For example, when starting with $3x^2 + 6x + 9 = 0$ and simplifying to $x^2 + 2x + 3 = 0$, we'd assign $b=2$ and $c=3$.)

Let's put on our geometry goggles and assume our quadratic equation refers to area:

An ongoing insight is that math doesn't have dimensions or units -- just raw quantities. We can *decide* that in this scenario, every quantity refers to the area of a 2d shape:

- $x^2$ is our square ($x * x$)
- $bx$ is a rectangular overhang ($b * x$)
- $c$ is an offset independent of $x$ ($c * 1$)

Solving the equation means: what length $x$ makes the square, overhang, and offset cancel to zero?

Without an offset ($x^2 + bx = 0$), canceling the total area is easy: just set $x = 0$ or $x = -b$, which collapses one side of the rectangle or the other. (Note that $x$ can have *negative* length, to cancel the width of the overhang.)

But that offset makes us do extra work.

The trick to canceling the offset is completing the square. First, we move half the overhang to the top of the square:

Next, we borrow from the Bank of Zero to fill in the corner:

This part is magic. We can conjure up any quantity if we promise to cancel it later (0 = 1 - 1). So, we borrow material to complete the square, and subtract it again:

Then we can move the extra pieces to the other side:

Let's fill in some specifics. How big is the corner? Half the overhang ($\frac{b}{2}$), squared. Time for some algebra:

Tada! It's a... slightly less complex quadratic formula.

This equation is simpler than the quadratic formula, but it's still gnarly.

$b$ is the width of the full overhang, and $\frac{b}{2}$ is the piece we move. Since that's the plan, why not write things in terms of the part we want? Let's make $b$ half the overhang:

This means our starting equation can be written:

$x^2 + 2bx + c = 0$

Where $b$ is now the "radius" (not full diameter) of the overhang. Completing the square and solving gives us:

Pretty clean!

Let's solve this equation:

My thought process: first, divide everything by 3. No need to leave things sitting around.

Next, let's find radius of the overhang. The entire linear coefficient is 2, so the radius is 1. Using the radius formula, we get:

Pretty fast, right?

And to factor the equation (writing it as a set of multiplications) we do:

(Verify with wolfram alpha: roots of 3x^2 + 6x + 24 )

Which version of the formula should you use? I'd rather use a simple formula on a simple equation, vs. a complicated formula on a complicated equation.

**Don't be afraid to rewrite equations**

The standard quadratic formula is fine, but I found it hard to memorize. Who says we can't modify equations to fit our thinking? Ideas like "remove $a$ from the equation" and "use the radius, not diameter" simplifies things, and nicknames like "square, overhang, offset" make the parts memorable.

Practically, we often memorize the equations we're given, but it doesn't mean you can't try a version that makes sense for you.

**Why are the roots negative**?

It seems strange to have formulas that begin with a negative sign:

Typically, we need negative lengths to fight the area added by the overhang and make the area collapse to zero. Depending on the values of $a$, $b$ and $c$, the solutions can be positive, negative, zero, or complex.

**What is negative area?**

This seems to be overlooked in discussions, but when completing the square we can have "negative" area. Negative area is created by sides of imaginary length.

Instead of positive and negative area, I think of colors (green/red). Green area is positive, with real sides (healthy land that grows crops). Red area is negative, with imaginary sides (poisoned land?). The math works out ($3i \cdot 3i = 9 i^2 = 9(-1) = -9$) , but our geometric concepts might need some upgrades.

**Moving from 2d to 1d**

Another aha! moment is realizing what happens when we take a square root. We're changing out interpretation from 2 dimensions back down to 1:

Taking the square root is like looking at our shapes *edgewise* and comparing the resulting lengths. The equations have no fixed dimensions -- just interpretations of quantities -- but I like this perspective shift. The unimaginative among us can see completing the square as pure symbol manipulation.

Happy math.

]]>We have a bunch of coefficients ($c_0, c_1...$ ) multiplied by various powers of $x$.

Why are we forced to learn about them? Here's the insights I wish I had.

How many lines are here?

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Why it's one hundred twenty three, of course. Oh, you'd probably have preferred the number in its polynomial form:

The digits "123" are simply the coefficients for various powers of 10, written largest to smallest. The "simple" idea of counting by adding powers of 10 is pretty game-changing, right?

We can even switch bases. The general interpretation of "123", for any base, is:

We're already familiar with decimals, so turn the coefficients "123" into a quantity we can understand. For hexadecimal numbers, we plug in $x = 16$ and work out $(1)16^2 +(2)16 + (3)16^0 = 291$.

Strictly speaking, 123.4 involves negative powers, since $0.1 = 10^{-1}$. Still, the original notion of polynomials gave us a good start, better than counting with lines in the sand.

Imagine you're taking notes for someone who's too lazy to use a calculator. They might say:

"Take the number 15. Actually, add 3. Multiply by 7. Now multiply by the original number again."

As arithmetic, you might write:

This is an exact expression, with specific numbers. But maybe your buddy is indecisive.

Making this more general -- with any starting number -- we'd write:

Notice that all the intermediate steps have been reduced to 2 terms. A polynomial!

Instead of thinking "When will I see this equation", think "This equation is the simplified version of a bunch of arithmetic". The steps don't look similar (where's the "add 3" portion?) but the polynomial boils everything to the essentials.

So, another big insight: we don't always need to *solve* a polynomial, we can use it to simplify many stages of arithmetic.

But... now that it's in its equation form, we *could* solve it. If we wanted to. Because it seems like a mathy thing to do.

Let's say we want to know when this process hits 100. What input number would that be?

Solving this gives $x = -5.56$ and $x=2.56$. (Yes, in the real world we can use tools to solve equations; grinding through the quadratic formula is an intuition for another day.)

The big aha is that polynomials are the boiled down version of arithmetic, assuming you can only:

- Add/subtract/multiply/divide by a constant, or:
- Multiply by your original number, x

Strictly speaking, polynomials don't allow for square roots ($\sqrt{x}$) or variable exponents ($2^x$), just the simple operations above.

Why limit ourselves this way? Well, we can see how far we can get.

Linear algebra is limited to linear functions, but even then we can model complex behavior if we chain enough of them together.

Similarly, polynomials are simple, but with *enough terms* we can model pretty complex behavior. The simple structure gives us several nice properties:

- Adding/multiplying polynomials gives us a polynomial
- Divide a polynomial by its roots, $(x - r)$, and get a polynomial (like dividing a compound number by one of its factors)
- Feed polynomials into each other $f(g(x))$
- The derivative / integral of a polynomial is a polynomial, making them easy to optimize

Polynomials are like the integers -- simpler than real numbers, and still useful.

What does $x$ mean? Well, it's a number, a quantity, something we want to represent.

What does $x \cdot x = x^2$ mean? It's an *interaction*: a quantity interacting with itself. In the case of length interacting with length, the result is area: length * length = square units. Or, we could have speed interacting with time to make distance: 30 mph for 30 hours is 900 miles.

A big aha: math doesn't care about units. A polynomial does the bookkeeping to show *that* an interaction like $x \cdot x$ happened, along with plain old $x$ and $x^{17}$.

In physics, you can't sensibly write $3cm + 5cm^2$ because the powers of the units don't match. You can't write $3 cm + 5kg$ because the types of units don't match.

But math doesn't care. Numbers are numbers, and "1 + 2*2 = 5" works, even though "2 times 2" represents an interaction.

A polynomial is a general-purpose accounting system where we can throw *anything* in the pot of soup. Will it taste good? Not the polynomial's problem. You'll get a formula and decide what, if anything, to do with it.

Given an input $x$, polynomials show what happens when we perform an ungodly amount of arithmetic to that input.

It seems, if we twiddle enough levers, we can get that polynomial to behave like almost any pattern we choose.

You got it: that's the Taylor series.

With enough terms, our humble polynomial -- composed of *basic arithmetic* -- can model undulating sine waves and other things that give students nightmares.

Each term in the polynomial gets us a better approximation of the original function:

We can analyze the coefficients in the polynomial and extract the "DNA" inside a function:

If the pattern in the coefficients are similar, the functions are likely related.

Back to the decimal analogy: 12 and 120 are similar, but that's only obvious when I write them as "decimal polynomials". If I spilled 12 and 120 skittles on the floor, you wouldn't notice any obvious connection between the two piles. Well, one might look slightly more appetizing.

One of the big insights in math -- called the Fundamental Theorem of Algebra -- is that a polynomial can be rewritten as a sequence of *multiplications*.

That is, adding powers of $x$ can *also* be seen as multiplying $x$ with various offsets:

There's as many roots ($a_1, a_2, a_3$) as the highest power of $x$ (roots can be used multiple times).

Why is this important? Well, besides being incredibly surprising (*added* powers can be converted to *multiplications*), it makes solving equations much easier (see: why do we factor equations?).

In short, it's not instantly obvious how to satisfy this:

Want to guess answers? Plug in $x$ and it shows up in places terms which need to be balanced. But what about this:

It's the same scenario, factored into multiplications. If *any* term goes to zero, the entire product becomes zero and we're done. Just pick $(x + 6)$ or $(x - 2)$, make it zero ($x = -6$ or $x = 2$), and we've solved the puzzle.

Remember how "123" was secretly a polynomial? For fun, using the accursed quadratic formula, we can factor it:

What does this mean? We can get the digits "123" to equal zero if we use a complex base! Not that you'd *want* to... but it's possible.

A polynomial is simple: just a bunch of powers of x. That means it's easy to use the Power Rule in Calculus to find the derivative, and find the min/max.

The reverse works as well: we can take a bunch of data points, fit a polynomial to them, and estimate the min/max of the sequence. The more terms, the more accurate (but beware overfitting).

A simple structure has its advantages.

Polynomials usually evoke memories of the quadratic equation, or equations force-fit into word problems. Ugh, let's skip past that.

Polynomials are a simple, powerful model used from arithmetic to algebra to calculus. They're as applicable in the "real world" as a 2-digit number.

Happy math.

]]>In math, we can get misleading intuitions about what can (or can't) be rearranged.

After learning addition, we've memorized facts like 2 + 4 = 6. But this might stray into the idea that "whenever I see 2 and 4, I can simplify to 6".

Although 2 + 4 = 6, but "baked(2) + baked(4)" is not "baked(6)". Baking unmixed ingredients in the exponential oven we get:

$2^2 + 4^2 \neq 6^2$

We can only confidently say:

$(2 + 4)^2 = 6^2$

We combine the ingredients, *then* bake the result. Exponents, like baking an apple pie, modify the original ingredients so they can't be easily combined later. While we might *recognize* the original 2 and 4, they aren't directly available. Two baked pies can't be smashed together to consolidate the filling.

This confusion gummed me up in calculus, when learning derivatives (the bad boy of baking).

In algebra, we internalize rules like:

But our intuition leads us astray when we get to the derivative.

because

Raw polynomials can be multiplied, but the *derivatives* of multiplied polynomials can't be rearranged so easily. Multiplication makes functions interact in a way that makes taking the derivative more complex:

Working through the Product Rule we get:

When learning Calculus, I was confused how standard interactions (like multiplication) needed special handling. I thought I was done learning new rules for "arithmetic".

But no: functions, when multiplied, interact in funky ways. See how each side grows its own sliver of area (`df * g`

and `dg * f`

)? The functions being multiplied are "baked together" and the overall effect depends on them both, simultaneously. We can't examine them in isolation (e.g., `df`

or `dg`

by itself).

Now, there are setups when the inputs *can* be processed separately and combined later (linear algebra). The cooking equivalent might be a smoothie: An apple/banana smoothie mixed with a peach/mango smoothie is the same as blending all ingredients in the beginning.

A common assumption is that operations are usually linear, but $\sin(a + b) \not= \sin(a) + \sin(b)$ and $(a + b)^2 \not= a^2 + b^2$. Sorry, we have to carefully cook the ingredients if we want the math to taste right.

When our intuition for a math rule doesn't make sense, ask "Are we making a pie, or a smoothie?"

]]>There's a math analogy here. Take a function, pick a specific point, and dive in. You can pull out enough data from a single point to rebuild the entire function. Whoa. It's like remaking a movie from a single frame.

The Taylor Series discovers the "math DNA" behind a function and lets us rebuild it from a single data point. Let's see how it works.

Given a function like $f(x) = x^2$, what can we discover at a single location?

Normally we'd expect to calculate a single value, like $f(4) = 16$. But there's much more beneath the surface:

- $f(x)$ = Value of function at point $x$
- $f'(x)$ = First derivative, or how fast the function is changing (the velocity)
- $f''(x)$ = Second derivative, or how fast the
*changes*are changing (the acceleration) - $f'''(x)$ = Third derivative, or how fast the
*changes*in the changes are changing (acceleration of the acceleration) - And so on

Investigating a single point reveals multiple, possibly infinite, bits of information about the behavior. (Some functions have an endless amount of data (derivatives) at a single point).

So, given all this information, what should we do? Regrow the organism from a single cell, of course! (*Maniacal cackle here.*)

Our plan is to grow a function from a single starting point. But how can we describe any function in a generic way?

The big aha moment: imagine any function, at its core, is a polynomial (with possibly infinite terms):

To rebuild our function, we start at a fixed point ($c_0$) and add in a bunch of other terms based on the value we feed it (like $c_1x$). The "DNA" is the values $c_0, c_1, c_2, c_3$ that describe our function exactly.

Ok, we have a generic "function format". But how do we find the coefficients for a specific function like sin(x) (height of angle x on the unit circle)? How do we pull out its DNA?

Time for the magic of 0.

Let's start by plugging in the function value at $x=0$. Doing this, we get:

Every term vanishes except $c_0$, which makes sense: the starting point of our blueprint should be $f(0)$. For $f(x) = \sin(x)$, we can work out $c_0 = \sin(0) = 0$. We have our first bit of DNA!

Now that we know $c_0$, how do we isolate $c_1$ in this equation?

Hrm. A few ideas:

Can we set $x = 1$? That gives $f(1) = c_0 + c_1(1) + c_2(1^2) + c_3(1^3) + \cdots$ . Although we know $c_0$, the other constants are summed together. We can't pull out $c_1$ by itself.

What if we divide by $x$? This gives:

Then we can set $x=0$ to make the other terms disappear... right? It's a nice idea, except we're now dividing by zero.

Hrm. This approach is really close. How can we *almost* divide by zero? Using the derivative!

If we take the derivative of the blueprint of $f(x)$, we get:

Every power gets reduced by 1 and the $c_0$, a constant value, becomes zero. It's almost too convenient.

Now we can isolate $c_1$ using our $x=0$ trick:

In our example, $\sin'(x) = \cos(x)$ so in our example: $f'(0) = \sin'(0) = \cos(0) = 1 = c_1$

Yay, one more bit of DNA! This is the magic of the Taylor series: by repeatedly applying the derivative and setting $x = 0$, we can pull out the polynomial DNA.

Let's try another round:

After taking the second derivative, the powers are reduced again. The first two terms ($c_0$ and $c_1x$) disappear, and we can again isolate $c_2$ by setting $x=0$:

For our sine example, $\sin'' = -\sin$, so:

or $c_2 = 0$.

As we keep taking derivatives, we're performing more multiplications and growing a factorial in front of each term (1!, 2!, 3!).

**The Taylor Series for a function around point x=0 is:**

(Technically, the Taylor series around the point $x=0$ is called the MacLaurin series.)

**The generalized Taylor series, extracted from any point a is:**

The idea is the same. Instead of our regular blueprint, we use:

Since we're growing from $f(a)$, we can see that $f(a) = c_0 + 0 + 0 + \dots = c_0$. The other coefficients can be extracted by taking derivatives and setting $x = a$ (instead of $x =0$).

Plugging in derivatives into the formula above, here's the Taylor series of $\sin(x)$ around $x = 0$:

And here's what that looks like:

A few notes:

**1) Sine has infinite terms**

Sine is an infinite wave, and as you can guess, needs an infinite number of terms to keep it going. Simpler functions (like $f(x) = x^2 + 3$) are already in their "polynomial format" and don't have infinite derivatives to keep the DNA going.

**2) Sine is missing every other term**

If we repeatedly take the derivative of sine at x = 0 we get:

with values:

Ignoring the division by the factorial, we get the pattern:

So the DNA of sine is something like [0, 1, 0, -1] repeating.

**3) Different starting positions have different DNA**

For fun, here's the Taylor series of $\sin(x)$ starting at $x =\pi$ (link):

A few notes:

The DNA is now something like [0, -1, 0, 1]. The cycle is similar, but the starting value has changed since we're starting at $x=\pi$.

Written as calculated numbers, the denominators 1, 6, 120, 5040 look strange. But they're just every other factorial: 1! = 1, 3! = 6, 5! =120, 7! = 5040. In general, the Taylor series can have gnarly denominators.

- The $O(x^{12})$ term means there are other components of order (power) $x^{12}$ and higher. Because $\sin(x)$ has infinite derivatives, we have infinite terms and the computer has to cut us off somewhere. (
*You've had enough Tayloring for today, buddy.*)

A popular use of Taylor series is getting a quick approximation for a function. If you want a tadpole, do you need the DNA for the entire frog?

The Taylor series has a bunch of terms, typically ordered by importance:

- $c_0 = f(0)$, the constant term, is the exact value at the point
- $c_1 = f'(0)x$, the linear term, tells us what speed to move from our point
- $c_2= \frac{f''(0)}{2!}x^2 $, the quadratic term, tells us how much to accelerate away from our point
- and so on

If we only need a prediction for a few instants around our point, the initial position & velocity may be good enough:

If we're tracking for longer, then acceleration becomes important:

As we get further from our starting point, we need more terms to keep our prediction accurate. For example, the linear model $\sin(x) = x$ is a good prediction around $x=0$. As we get further out, we need to account for more terms.

Similarly, $e^x \sim 1 + x$ works well for small interest rates: 1% discrete interest is 1.01 after one time period, 1% continuous interest is a tad higher than 1.01. As time goes on, the linear model falls behind because it ignores the compounding effects.

What's a common application of DNA? Paternity tests.

If we have a few functions, we can compare their Taylor series to see if they're related.

Here's the expansions of $\sin(x)$, $\cos(x)$, and $e^x$:

There's a family resemblence in the sequences, right? Clean powers of $x$ divided by a factorial?

One problem is the sequence for $e^x$ has positive terms, while sine and cosine alternate signs. How can we link these together?

Euler's great insight was realizing an imaginary number could swap the sign from positive to negative:

Whoa. Using an imaginary exponent and separating into odd/even powers reveals that sine and cosine are hiding inside the exponential function. Amazing.

Although this proof of Euler's Formula doesn't show *why* the imaginary number makes sense, it reveals the baby daddy hiding backstage.

**Relationship to Fourier Series**

The Taylor Series extracts the "polynomial DNA" and the Fourier Series/Transform extracts the "circular DNA" of a function. Both see functions as built from smaller parts (polynomials or exponential paths).

**Does the Taylor Series always work?**

This gets into mathematical analysis beyond my depth, but certain functions aren't easily (or ever) approximated with polynomials.

Notice that powers like $x^2, x^3$ explode as $x$ grows. In order to have a slow, gradual curve, you need an army of polynomial terms fighting it out, with one winner barely emerging. If you stop the train too early, the approximation explodes again.

For example, here's the Taylor Series for $\ln(1 + x)$. The black line is the curve we want, and adding more terms, even dozens, barely gets us accuracy beyond $x=1.0$. It's just too hard to maintain a gentle slope with terms that want to run hog wild.

In this case, we only have a radius of convergence where the approximation stays accurate (such as around $|x| < 1$).

**Turning geometric to algebraic definitions**

Sine is often defined geometrically: the height of a line on a circular figure.

Turning this into an equation seems really hard. The Taylor Series gives us a process: If we know a single value and how it changes (the derivative), we can reverse-engineer the DNA.

Similarly, the description of $e^x$ as "the function with its derivative equal to the current value" yields the DNA [1, 1, 1, 1], and polynomial $f(x) = 1 + \frac{1}{1!}x + \frac{1}{2!}x^2 + \frac{1}{3!}x^3 + \dots $. We went from a verbal description to an equation.

Phew! A few items to ponder.

Happy math.

]]>