Aha! Moments When Learning Git

Git is a fast, flexible but challenging distributed version control system. Before jumping in:

Along with a book, tutorial and cheatsheet, here are the insights that helped git click.

There's a staging area!

Git has a staging area. Git has a staging area!!!

Yowza, did this ever confuse me. There's both a repo ("object database") and a staging area (called "index"). Checkins have two steps:

  • git add foo.txt
    • Add foo.txt to the index. It's not checked in yet!
  • git commit -m "message"
    • Put staged files in the repo; they're now tracked
    • You can "git add --update" to stage all tracked, modified files

Why stage? Git's flexible: if a, b and c are changed, you can commit them separately or together.

But now there's two undos:

  • git checkout foo.txt
    • Undo local changes (like svn revert)
  • git reset HEAD foo.txt
    • Remove from staging area (local copy still modified).

Add and commit, add and commit -- Git has a rhythm.

Branching is "Save as..."

Branches are like "Save as..." on a directory. Best of all:

  • Easily merge changes with the original (changes tracked and never applied twice)
  • No wasted space (common files only stored once)

Why branch? Consider the utility of "Save as..." for regular files: you tinker with multiple possibilities while keeping the original safe. Git enables this for directories, with the power to merge. (In practice, svn is like a single shared drive, where you can only revert to one backup).

Imagine virtual directories

I see branches as "virtual directories" in the .git folder. While inside a physical directory (c:\project or ~/project), you traverse virtual directories with a checkout.

  • git checkout master
    • switch to master branch ("cd master")
  • git branch dev
    • create new branch from existing ("cp * dev")
    • you still need to "cd" with "git checkout dev"
  • git merge dev
    • (when in master) pull in changes from dev ("cp dev/* .")
  • git branch
    • list all branches ("ls")

My inner dialogue is "change to dev directory (checkout)... make changes... save changes (add/commit)... change to master directory... copy in changes from dev (merge)".

The physical directory is a scratchpad. Virtual directories are affected by git commands:

  • rm foo.txt
    • Remove foo.txt from your sandbox (restored if you checkout the branch again)
  • git rm foo.txt
    • Remove foo.txt from current virtual directory
    • Gotcha: you need to commit that change!

Know the current branch

Just like seeing your current directory, put the current branch in your prompt!

In my .bash_profile (modified from here):

parse_git_branch() {
    git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* (.*)/(1)/'
}

export PS1="[33[00m]u@h[33[01;34m] W [33[31m]$(parse_git_branch) [33[00m]$[33[00m] "

Visualize your branch structure

Git leaves branch organization to you. Nvie.com has a great branch strategy:

  • Have a mainline (master). Mentally it's on the far right.
  • Create branches (master -> dev) and subbranches (dev -> featureX). The further from master, the crazier.
  • Only merge with neighbors (master -> dev -> feature X, or featureX -> dev -> master)

Stay sane by choosing a branch layout up front. I have a master tracking a svn project, and dev for my own code. In general, master is clean so I can branch anytime for one-off fixes.

Understand local vs. remote

Git has local and remote commands; seeing both confused me ("When do you checkout vs. pull?"). Work locally, syncing remotely as needed.

Local data

  • git init
    • create local repo
    • use git add/commit/branch to work locally

Remote data

  • git remote add name path-to-repo
    • track a remote repo (usually "origin") from an existing repo
    • remote branches are "origin/master", "origin/dev" etc.
  • git branch -a
    • list all branches (remote and local)
  • git clone path-to-repo
    • create a new local git repo copied from a remote one
    • local master tracks remote master
  • git pull
    • merge changes from tracked remote branch (if in dev, pull from origin/dev)
  • git push
    • send changes to tracked remote branch (if in dev, push to origin/dev)

Why local and remote? Subversion has central checkins, so you avoid committing unfinished work. With git, local commits are frequent and you only push when ready.

GUIDs are GOOD

Git addresses information by a hash (GUID) of its contents. If two branches are the same, they have the same GUID (and vice versa).

Why's this cool? We can create branches independently, merge them, and have a common GUID. No central numbering needed. Usually, we just compare the first few digits: "Are you on a93?".

Tips & Tricks

For your .gitconfig:

[alias]
        ci = commit
        st = status
        co = checkout
        oneline = log --pretty=oneline
        br = branch
        la = log --pretty="format:%ad %h (%an): %s" --date=short

There are some GUI tools for git, but I prefer to learn via the command line. Git is opinionated software (which I like), and analogies help me understand its world view.

Other Posts In This Series

  1. A Visual Guide to Version Control
  2. Intro to Distributed Version Control (Illustrated)
  3. Aha! Moments When Learning Git
Kalid Azad loves those Aha! moments when an idea finally clicks. BetterExplained is dedicated to learning with intuition, not blind memorization, and is honored to serve 250k readers each month.

If you liked this article, try the site guide, the ebook, or join the free newsletter.

25 Comments

  1. Nice writeup. I’m stuck using git-svn so my workflow is a little vanilla right now but a few things that would help make it clearer:

    1. Is the staging area per branch, i.e. if you switch branches what happens to changes you had previously staged but not committed?

    2. The change directory metaphor is pretty good, but doesn’t it only apply to files that are staged or committed? If I change a file and then immediately switch to a branch, I lose that change, right?

    You should write about git stash in a follow-up – it’s one of the coolest features.

    Thanks, great blog.

    Brian

  2. Pretty cool article, I’ll use it to convince svn people to switch to git.

    I would add this in the [alias] section of the ~/.gitconfig file:

    cancel = reset –soft HEAD^

    like this you can cancel the last commit with `git cancel`. Often usefull when you forgot to add a file in the commit for instance.

  3. @ Brian

    > 1. Is the staging area per branch, i.e. if you switch branches what happens to changes you had previously staged but not committed?

    There are not separate staging areas. There is only one index which is independent of the branch you are on. If you have changes on the index and switch branches then those changes stay there, ready to committed on the new branch.

  4. Good article but I have to ask what you used to make your swimlanes diagram! Stylish and attention grabbing.

  5. This quick write-up was the missing puzzle piece to help me finally achieve an understanding of how git (and to some extent, how version control in general) works.

    And great timing! I’ve got enough downtime until my next project to start playing with it.

    Thanks!

  6. I like your way of explaining things. Simple/real examples and easy to understand. Thanks.

  7. @Mikhail: Awesome, glad it helped!

    @Brian: There’s only one staging area; git won’t let you switch to another branch if you have unsaved changes (you can override this). Great suggestion on git stash (I might add it to this article).

    @p4bl0: Good idea with the alias — there are common operations which are hard to remember –soft, –hard, etc. :)

    @Eric: Thanks for the fast response.

    @Bob: Actually, the diagram is from the nvie.com site — I don’t know what tool was used, but am interested!

    @Terry: Awesome, glad it helped! Learning git definitely involved finding those missing puzzle pieces.

    @Anon: Yeah, I don’t mean GUID in terms of a specific output format, but a unique identifier (but some GUIDs use SHA-1 [http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Version_5_.28SHA-1_hash.29] :) ).

    @Junior: Thanks! I’m always trying to keep things simple; it makes it easier when I forget and have to relearn :).

    @cl: Glad it worked out.

  8. Yes really nice article. Far shorter and better than the one I’ve done for my friends :-).

    But perhaps you could think on how to work on parallel branches that should never be merged again.

    This is clearly a lack in DCVS today. There is no natural way on how to deal with that. Just an example:

    Say I develop a website for customers. They all have a common part. But imagine one customer (A) need a really specific feature. And imagine one of my previous customer (B) want the same feature.

    There is no easy way to just send the “patch” of this feature to the customer B. Even if I made a specific branch for this feature. Because I should have started the branch after the branch for A and B have diverged.

    [bad ascii art]

    time —->

    feature ——–
    A ——————
    C ——————————-
    B ————————

    [end of bad ascii art]

    if I merge “feature” in A I get many stuff of B which shouldn’t be push to A. Of course, If I had thinked twice before starting the feature branch from B but I started it from C, it wouldn’t be a problem.

    I loved you article about branching. I’m curious to know if you can imagine a way to handle such a case.

  9. “GIT has a staging area” – This was exactly my Aha moment too. Had been digging GIT for a yr, totally confused. Then the Aha staging area moment. GITs a Beauty.

  10. Yogsototh: Yes, there is a way to get what you want: it’s called cherry-picking.

    Git is actually very good at dealing with the problem you describe. You can use “git rebase” to reorder or move changes around to make a feature branch (even ex post facto), and then merge that feature branch into all of the customer branches that want it.

  11. @cypherpunks: Thanks for the note — yes, I’ve realized that almost any workflow is possible with git, it’s a matter of getting your head around it ;).

    @av: Yep, the presence of a staging area confused me for a while.

  12. Nice article, and idea for an article.

    You list “git branch -a” under “remote data”, but it’s a local command using only local data. The data shows the results of the last fetch of remote data, but it’s a strictly local operation.

    For me the most important git “aha” was understanding the difference between, say, “master”, “origin/master”, and the “master” that lived at “origin”. And then understanding the commands that examine and move changes between each of these.

    Understanding those and understanding the staging area comprise almost all of the challenge for a new user coming from the Subversion world, in my experience.

  13. @Jakub: Thanks for the pointers!

    @Pete: Thanks — that’s a great point. Right now, intuitively, I see local master as my changes, origin/master as the last changes I’ve synced, and “master that lived at origin” as the very latest changes on the site. I wonder if I’m correct ;).

    @Matthieu: Yeah, I contemplated putting in the -b shortcut but decided it was too much. Git has shortcuts for everything it seems! Sometimes it’s nice to know the atomic operations vs. the magic switches :).

  14. I can only add to the guys above: Thank you for the ‘essence’ and ‘brevity’ :-) You might like the book ‘Clean Code’ by Robert Martin, if you didn’t get the tip to that one already :-)

  15. @R: Thanks for the suggestion! I haven’t read it, but heard it mentioned before, so now it’s definitely on my list.

  16. my aliases:
    list = diff –name-status
    cam = commit -a -m
    cm = commit -m
    search = grep –color -n
    searchf = grep –name-only
    searchn = grep –files-without-match

  17. I see you’ve responded to this already, but you really shouldn’t confuse SHA-1 hashes with GUIDs. GUIDs are a much more general concept, they are basically random numbers. Even when a SHA-1 hash is used to generate a GUID, it it just a mechanism to generate a random number. Identifiers in Git are not random numbers, they are very specific numbers which give Git special properties. For example, if you have a single SHA-1 Git revision number, you can verify with a strong degree of confidence the entire development history of the code base up to that point. That is because the SHA-1 hash corresponds directly to the content of the commit. This deep relationship between content and the id used for that content is baked into the repository format, so that each file will be saved with the same name if it contains the same content.

    Also, GUIDs always have hyphens in them, Git ids never do.

Your feedback is welcome -- leave a reply!

Your email address will not be published.

LaTeX: $$e=mc^2$$