- Build a
**lasting intuition**for the key ideas. - During the course, understand it enough to solve problems.
- After the course, enjoy it enough to revisit.

That's why I learn things. Non-goals are transcribing what a teacher says, or cramming only to forget everything. (Yeah, it's a game we play, but we're stepping off the treadmill and only cheating ourselves. Most subjects have useful insights buried somewhere.)

So, here's my strategy when studying:

If an idea clicks, write down the

*Aha!*moment in language you'd use yourself.If it doesn't, write down the

*Huh?*moment. Move on and try again later (such as with the ADEPT method).

Keep it simple, like the KonMari method of organizing: *Look at everything in your house.* *Does it spark joy? Keep what does, thank and donate what doesn't.*

**A simple study plan: Go through the material. Did it click? Write down what helped, otherwise look for a better explanation.**

My current learning project is the Machine Learning Class on Cousera. I've read a smattering of blog posts, the subject is growing, and after my friend asked me to join the class, I had to sign up. (It's great.)

Here's where I'm keeping my notes, Aha, and Huh moments:

Machine Learning Notes on Google Docs

This is one of the best learning experiences I can remember. A few examples:

For the major concepts the course depends on, I keep a 5-second summary in mind. This underlying concept, why does it exist? In plain English, what does it mean?

- Linear Algebra: spreadsheets for your equations. We "pour" data through various operations.
- Natural log: time needed to grow. Helps normalize widely varying numbers.
- e^x: models continuous growth, has a simple derivative.
- Gradient: direction of greatest change, helps optimize.
- Calculus -Art of breaking a system into steps. With the gradient, we can move in the best direction.

I reference these snippets as I encounter new formulas.

There was a formula that I expected to be positive ("cost" should be positive), yet it had a negative sign out front. What gives?

It turns out I had forgotten a part of the derivation, where we expected the natural log to be negative. (This happens when we take the logarithm of numbers less than 1 — in other words, we are going "back in time" and shrinking.)

I would have preferred the equation written another way, and I made a note of this Huh? moment.

Early in the course, we define a "cost" function which tracks the difference between our predictions and the real value.

Why not call this difference something normal, like error?

It turns out "cost" is used because later in the course, we have items to minimize (like the number of variables in our model) which are not directly related to the error. The "cost" captures things outside the model, like the complexity we have. (If two models make equally accurate predictions, prefer the simpler one.)

Ah, "cost" can include fuzzier concepts. (I'd still prefer that laid out up-front.)

As I go through the course, I have a plain-English definition in mind. What's it all about?

**Machine Learning: Create models with Linear Algebra, then improve them with Calculus.**

- Linear Algebra lets us use many (tens, hundreds, thousands) of variables in a "math spreadsheet".
- Calculus lets us improve our spreadsheet via feedback on how well it's working. Using functions like e^x, ln(x), x^2, etc. make it easy to take derivatives. Absolute value, if/then statements, etc. aren't easy to work with.

Now my thinking becomes: What types of predictive models can I make? If Linear Algebra can describe it, let's use it.

After the course is done, you're left with a set of notes that make sense to you: the Ahas, Huhs, and other gotchas. (This website is a running collection of mine.)

Future learning gets that much easier. Remember how you were confused about a topic a few years ago? Well, let's read the explanation *you wrote to yourself* on how to overcome it. Over time you build up a massive collection.

Other tips:

Embrace your confusion. The hesitation you feel when you see a formula is ok. Try to break down each part of the equation, ask what it means, make note of what is confusing and return over time. Every positive sign, every variable, why are they there?

It's ok to forget things - I do all the time. I just want a list of intuitions to load up when needed. Often a single phrase or diagram will bring it all back.

These notes are meant for you. Make them fast and quick. (My notes eventually become articles, but they stay informal and for my own use till then.)

The textbook already exists. Don't simply copy what the teacher/book said, add what

*you need*to make it clear.

This course is among the most fun I've had -- this is what learning should feel like, exploration with constant refinement. I'm curious to see if this approach helps you too.

For your next course, try keeping your notes in a single Google doc. Write down your Aha! and Huh? moments. Send me a link and I'll add them to this list:

- Kalid Azad - Coursera Machine Learning
- [you go here]

I'm curious to see what works for you, feedback is always welcome.

Happy math.

]]>However, the numbers follow a grid, with rules nobody told me (image source, click to enlarge):

Even numbers go East/West (I-90, I-10), and odd numbers go North/South (I-5, I-95). Think "Even" goes "East".

Numbers increase towards the Northeast. (Hey, NYC thinks it's the center of the world, right?) I-5 is on the West coast, I-95 on the East coast. I-10 must be in Texas, I-90 must be in Massachusetts.

Auxiliary interstates connect to the primary ones, and have 3 digits: 290 connects to 90, 495 connects to 95, etc.

- Odd prefixes (190) connect once into the city from the interstate ("spur").
- Even prefixes (495) typically loop around a city. (Being a man-made system, there are exceptions.)

Whoa. There's so much information conveyed in a simple numbering scheme! Without looking at a map, I know I can drive from Seattle to Boston on I-90. Maybe I'll take I-95 South when I'm there and make my way to Florida. On the way I'll take I-10 West, over to LA, then drive up I-5 North back to Seattle.

How does this work?

We have a concept of a number, and all its properties (even/odd, size, number of digits...)

We noticed a real-world object (a highway) that had various properties (North/South, position, major/minor)

We associated the properties of the number to the properties of the object

*This* is thinking mathematically. It's not about doing arithmetic quickly, or memorizing formulas, it's about connecting patterns. Math is an imaginary zoo of made-up objects and relationships that can describe ones in the real world.

Have we used all the interesting properties of a number? How about whether it's a prime number.

Suppose local routes used small prime numbers: Route 2, 3, 5, 7, 11. (Yep, remember that 2 is prime.)

Once the main routes are numbered, smaller roads that *connect* them can follow this rule:

If you connect two routes, use their product. 3 * 11 = 33, so Route 33 connects Route 3 and 11.

If you loop back to the same route, just square it. 3 * 3 = 9, so Route 9 connects Route 3 to itself.

If you connect three roads, it could be Route 66 (connecting routes 2, 3 and 11).

Will this always work? You bet. Any two primes, when multiplied, give a *unique* number. 33 will never be reached by any other combination of primes. (The fancy math phrase: every number has a unique prime factorization.)

See how we're trying to cram a bunch of information into a little number? That's the essence of binary data.

An eight-bit binary number like `01000100`

is essentially eight true/false questions:

- Are you East/West? (1 if yes, 0 otherwise)
- Are you local connection? (1 if yes...)
- Are you a spur road?
- Treating your route number as a set of binary digits...
- Anything in the ones digit?
- Anything in the twos digit?
- Anything in the fours digit?
- Anything in the eights digit?
- Anything in the sixteens digit?

An 8-bit binary number can pack in a bunch of related questions into a single byte, and is what makes binary so efficient.

Numbers have a bunch of properties, right? Aren't we curious to discover more, like the remainder (modular arithmetic)? Maybe Route 12 (which is one set of 11, remainder 1) has some connection to Route 11.

Happy math.

]]>What would you do? Well, you could work out the exact formula:

and plug in n=100 to get 5050.

But we just want a rough answer. You have a list of numbers, they follow a simple pattern, and want a quick estimate. What to do?

The "easy" way (well, the Calculus way) is to realize 1 + 2 + 3 + 4 is about the same as f(x) = x. The first element is f(1) = 1, the second is f(2) = 2, and so on.

From here, we can take the integral:

We usually see the integral as a formal, elegant operation, which artfully accumulates one function and returns another. Informally, we're squashing everything together in that bad mamma-jamma and seeing how much there is.

The result frac(1)(2) x^{2} should be pretty close to what we want.

The *exact* total is our staircase-like pattern, which accumulates to 5050.

The *approximate* answer is the area of that triangle, frac(1)(2) base · height = frac(1)(2) 100 · 100 = 5000. The difference is because of the corners in the staircase which overhang. frac(x)(2) is one-half, x times (the size of overhang (1/2) times the number of pieces (x)).

The net result is using a smooth, easy-to-measure shape to approximate a jagged, tedious-to-measure one. (This is a bit of Calculus inception, since we usually use rectangles to approximate smooth shapes.)

This tactic works for other sequences:

**What's the sum of the first 10 square numbers? 1 + 4 + 9 + 16 + 25 + ... + 100 = ?**

Hrm. The formula is probably tricky to work out. But without our Calculus-infused Arithmetic, a quick guess would be:

Our first hunch should be "one third of 10^3" or 333. But as we saw before, there's an "overhang" that we missed. Let's call it 10%, for an estimate of 330 + 10% ~ 370.

The exact answer is 385. Not bad! The actual formula is:

I'd say frac(x^{3})(3) isn't bad for a few seconds of work.

**Data doubles every year. What does lifetime usage look like?**

The integral (squashed-together total) of an exponential is an exponential. In Calculus terms,

The key insight is that all exponential growth is just a variation of e^{x}. If e^{x} accumulates exponentially, so will 2^{x}.

So the total usage to date will also follow an exponential pattern, doubling every year also. Contrast this with a usage pattern of "1 + 2 + 3 + 4 ..." -- we grow linearly (f(x) = x), but total usage accumulates quadratically (frac(1)(2)x^{2}).

My goal is to incorporate math thinking into everyday scenarios. We start with an arithmetic question, convert it to a geometry puzzle (how big is the staircase?), and then use calculus to approximate it.

I know a concept is clicking when I can switch between a few styles of thought. Imagine the problem as a script: how would Spielberg, Tarantino, or Scorsese direct it? Each field takes a different look. (To learn how to think with Calculus, check out the Calculus Guide.)

Happy math.

]]>But it's really about empathy and reading your audience. What does the other person know? Are they getting lost? Are they having fun? (Am *I* having fun?)

What looks like a communication obstacle to an alien observer is an *enjoyable experience* for the human participants. Sure, there's an idea to convey, but maybe there's a clever, funny, or astoundingly simple way to convey it. Aha!

Math teaching should be the same: convey ideas with empathy for your audience.

I use the ADEPT method to remind me of what helps me learn: an Analogy, Diagram, Example, Plain English, Technical Definition.

But when *sharing* a math idea, I have a different mental checklist. No convenient acronym, just a list of questions to ponder:

If an idea was debated for centuries before being accepted, shouldn't that be taught?

Sure. Ok. How many of you know that negative numbers were called *numeri absurdi*? Only accepted (in the West) in the 1800s?

When we have struggles with new concepts (like imaginary numbers, also considered absurd), reference similar struggles in the past. *Hey, you're confused? Good. So was everyone else, and here's how we resolved it.*

Hey you (yes you, the teacher) -- what struggles did you have when learning?

Did imaginary numbers click instantly, without doubt? Did the Fourier Transform just snap into place on your first reading?

(You'd think so, given the unblinking, matter-of-fact treatment in most lessons. Argh!)

If you, the teacher, struggled with an idea, don't hide it: what tripped you up, how did you resolve it, and what issues do you still have?

I needed simulations before I understood the Fourier Transform: playing around with them made it click. Instead of writing down the definition, share the "behind-the-scenes" of what helped.

Learning is a back-and-forth process. If students don't have questions, they either understood it perfectly, or they are scared/uninterested.

In charades, we can easily see if the other player is confused or having a good time.

Academic writing is a bomb shelter, built to be defended from critics. Stable, rock-solid, but not welcoming.

I'd prefer to make a beach bungalow you look forward to visiting. Yeah, the banana-leaf roof is leaky, and no, Dwight, it cannot withstand an aerial assault from AGM-114 Hellfire missiles. But we'll have a great time all the same.

Lessons barricaded with prefaces and caveats indicate you are protecting yourself, not trying to be helpful. (*If students began Calculus without a month studying limits, they might (gasp) not have a rigorously defensible understanding on Day 1!*)

At some point you reinforce the bungalow, don't start there.

Make your students awesome. I want readers to learn things in minutes that took me a decade to untangle. (Kathy Sierra has a great talk about making users awesome.)

Giving impressively rigorous definitions on day 1 doesn't make students awesome. Ignoring historical and personal confusion doesn't make students awesome. Organized chapters of theorem/proof/exercise doesn't make students awesome.

Share what actually worked, in a way you would have liked to see it.

Happy math.

]]>