Git is a fast, flexible but challenging distributed version control system. Before jumping in:
Along with a book, tutorial and cheatsheet, here are the insights that helped git click.
There’s a staging area!
Git has a staging area. Git has a staging area!!!
Yowza, did this ever confuse me. There’s both a repo (“object database”) and a staging area (called “index”). Checkins have two steps:
git add foo.txt- Add foo.txt to the index. It’s not checked in yet!
git commit -m "message"- Put staged files in the repo; they’re now tracked
- You can “
git add --update"to stage all tracked, modified files
Why stage? Git’s flexible: if a, b and c are changed, you can commit them separately or together.
But now there’s two undos:
git checkout foo.txt- Undo local changes (like svn revert)
git reset HEAD foo.txt- Remove from staging area (local copy still modified).
Add and commit, add and commit — Git has a rhythm.
Branching is “Save as…”
Branches are like “Save as…” on a directory. Best of all:
- Easily merge changes with the original (changes tracked and never applied twice)
- No wasted space (common files only stored once)
Why branch? Consider the utility of “Save as…” for regular files: you tinker with multiple possibilities while keeping the original safe. Git enables this for directories, with the power to merge. (In practice, svn is like a single shared drive, where you can only revert to one backup).
Imagine virtual directories
I see branches as “virtual directories” in the .git folder. While inside a physical directory (c:\project or ~/project), you traverse virtual directories with a checkout.
git checkout master- switch to master branch (“cd master”)
git branch dev- create new branch from existing (“cp * dev”)
- you still need to “cd” with “git checkout dev”
git merge dev- (when in master) pull in changes from dev (“cp dev/* .”)
git branch- list all branches (“ls”)
My inner dialogue is “change to dev directory (checkout)… make changes… save changes (add/commit)… change to master directory… copy in changes from dev (merge)”.
The physical directory is a scratchpad. Virtual directories are affected by git commands:
rm foo.txt- Remove foo.txt from your sandbox (restored if you checkout the branch again)
git rm foo.txt- Remove foo.txt from current virtual directory
- Gotcha: you need to commit that change!
Know the current branch
Just like seeing your current directory, put the current branch in your prompt!

In my .bash_profile (modified from here):
parse_git_branch() {
git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}
export PS1="\[\033[00m\]\u@\h\[\033[01;34m\] \W \[\033[31m\]\$(parse_git_branch) \[\033[00m\]$\[\033[00m\] "
Visualize your branch structure
Git leaves branch organization to you. Nvie.com has a great branch strategy:

- Have a mainline (master). Mentally it’s on the far right.
- Create branches (master -> dev) and subbranches (dev -> featureX). The further from master, the crazier.
- Only merge with neighbors (master -> dev -> feature X, or featureX -> dev -> master)
Stay sane by choosing a branch layout up front. I have a master tracking a svn project, and dev for my own code. In general, master is clean so I can branch anytime for one-off fixes.
Understand local vs. remote
Git has local and remote commands; seeing both confused me (“When do you checkout vs. pull?”). Work locally, syncing remotely as needed.
Local data
git init
- create local repo
- use git add/commit/branch to work locally
Remote data
git remote add name path-to-repo- track a remote repo (usually “origin”) from an existing repo
- remote branches are “origin/master”, “origin/dev” etc.
git branch -a- list all branches (remote and local)
git clone path-to-repo- create a new local git repo copied from a remote one
- local master tracks remote master
git pull- merge changes from tracked remote branch (if in dev, pull from origin/dev)
git push- send changes to tracked remote branch (if in dev, push to origin/dev)
Why local and remote? Subversion has central checkins, so you avoid committing unfinished work. With git, local commits are frequent and you only push when ready.
GUIDs are GOOD
Git addresses information by a hash (GUID) of its contents. If two branches are the same, they have the same GUID (and vice versa).
Why’s this cool? We can create branches independently, merge them, and have a common GUID. No central numbering needed. Usually, we just compare the first few digits: “Are you on a93?”.
Tips & Tricks
For your .gitconfig:
[alias]
ci = commit
st = status
co = checkout
oneline = log --pretty=oneline
br = branch
la = log --pretty=\"format:%ad %h (%an): %s\" --date=short
There are some GUI tools for git, but I prefer to learn via the command line. Git is opinionated software (which I like), and analogies help me understand its world view.
Other Posts In This Series
- A Visual Guide to Version Control
- Intro to Distributed Version Control (Illustrated)
- Aha! Moments When Learning Git (This post)
65 thoughts on “Aha! Moments When Learning Git”
Thank you very much, this certainly helps to understand
Exactly what I needed.
Nice writeup. I’m stuck using git-svn so my workflow is a little vanilla right now but a few things that would help make it clearer:
1. Is the staging area per branch, i.e. if you switch branches what happens to changes you had previously staged but not committed?
2. The change directory metaphor is pretty good, but doesn’t it only apply to files that are staged or committed? If I change a file and then immediately switch to a branch, I lose that change, right?
You should write about git stash in a follow-up – it’s one of the coolest features.
Thanks, great blog.
Brian
Pretty cool article, I’ll use it to convince svn people to switch to git.
I would add this in the [alias] section of the ~/.gitconfig file:
cancel = reset –soft HEAD^
like this you can cancel the last commit with `git cancel`. Often usefull when you forgot to add a file in the commit for instance.
@ Brian
> 1. Is the staging area per branch, i.e. if you switch branches what happens to changes you had previously staged but not committed?
There are not separate staging areas. There is only one index which is independent of the branch you are on. If you have changes on the index and switch branches then those changes stay there, ready to committed on the new branch.
Good article but I have to ask what you used to make your swimlanes diagram! Stylish and attention grabbing.
This quick write-up was the missing puzzle piece to help me finally achieve an understanding of how git (and to some extent, how version control in general) works.
And great timing! I’ve got enough downtime until my next project to start playing with it.
Thanks!
Git names objects using SHA1 hashes, not GUIDs (and GUIDs are *not* hashes).
I like your way of explaining things. Simple/real examples and easy to understand. Thanks.
Nice timing!. Thanks.
@Mikhail: Awesome, glad it helped!
@Brian: There’s only one staging area; git won’t let you switch to another branch if you have unsaved changes (you can override this). Great suggestion on git stash (I might add it to this article).
@p4bl0: Good idea with the alias — there are common operations which are hard to remember –soft, –hard, etc.
@Eric: Thanks for the fast response.
@Bob: Actually, the diagram is from the nvie.com site — I don’t know what tool was used, but am interested!
@Terry: Awesome, glad it helped! Learning git definitely involved finding those missing puzzle pieces.
@Anon: Yeah, I don’t mean GUID in terms of a specific output format, but a unique identifier (but some GUIDs use SHA-1 [http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Version_5_.28SHA-1_hash.29]
).
@Junior: Thanks! I’m always trying to keep things simple; it makes it easier when I forget and have to relearn
.
@cl: Glad it worked out.
Yes really nice article. Far shorter and better than the one I’ve done for my friends
.
But perhaps you could think on how to work on parallel branches that should never be merged again.
This is clearly a lack in DCVS today. There is no natural way on how to deal with that. Just an example:
Say I develop a website for customers. They all have a common part. But imagine one customer (A) need a really specific feature. And imagine one of my previous customer (B) want the same feature.
There is no easy way to just send the “patch” of this feature to the customer B. Even if I made a specific branch for this feature. Because I should have started the branch after the branch for A and B have diverged.
[bad ascii art]
time —->
feature ——–
A ——————
C ——————————-
B ————————
[end of bad ascii art]
if I merge “feature” in A I get many stuff of B which shouldn’t be push to A. Of course, If I had thinked twice before starting the feature branch from B but I started it from C, it wouldn’t be a problem.
I loved you article about branching. I’m curious to know if you can imagine a way to handle such a case.
“GIT has a staging area” – This was exactly my Aha moment too. Had been digging GIT for a yr, totally confused. Then the Aha staging area moment. GITs a Beauty.
Yogsototh: Yes, there is a way to get what you want: it’s called cherry-picking.
Git is actually very good at dealing with the problem you describe. You can use “git rebase” to reorder or move changes around to make a feature branch (even ex post facto), and then merge that feature branch into all of the customer branches that want it.
@cypherpunks: Thanks for the note — yes, I’ve realized that almost any workflow is possible with git, it’s a matter of getting your head around it
.
@av: Yep, the presence of a staging area confused me for a while.
Do not use ‘git branch’ in scripts. Use ‘git symbolic-ref HEAD’ or ‘git rev-parse –symbolic-full-name HEAD’ together with advanced variable substitution.
If you want to have branch name in bash prompt, use ‘__git_ps1′ from ‘contrib/completion/git-completion.bash’ (http://repo.or.cz/w/git.git/blob_plain/HEAD:/contrib/completion/git-completion.bash)
Nice article, and idea for an article.
You list “git branch -a” under “remote data”, but it’s a local command using only local data. The data shows the results of the last fetch of remote data, but it’s a strictly local operation.
For me the most important git “aha” was understanding the difference between, say, “master”, “origin/master”, and the “master” that lived at “origin”. And then understanding the commands that examine and move changes between each of these.
Understanding those and understanding the staging area comprise almost all of the challenge for a new user coming from the Subversion world, in my experience.
you need a “git checkout” after a “git branch” only because you don’t know “git checkout -b”
@Jakub: Thanks for the pointers!
@Pete: Thanks — that’s a great point. Right now, intuitively, I see local master as my changes, origin/master as the last changes I’ve synced, and “master that lived at origin” as the very latest changes on the site. I wonder if I’m correct
.
@Matthieu: Yeah, I contemplated putting in the -b shortcut but decided it was too much. Git has shortcuts for everything it seems! Sometimes it’s nice to know the atomic operations vs. the magic switches
.
There’s a nice command line tool called “tig” that you may find interesting:
http://jonas.nitro.dk/tig/
@Michael: Thanks, I’ll have to check it out.
I can only add to the guys above: Thank you for the ‘essence’ and ‘brevity’
You might like the book ‘Clean Code’ by Robert Martin, if you didn’t get the tip to that one already
@R: Thanks for the suggestion! I haven’t read it, but heard it mentioned before, so now it’s definitely on my list.
my aliases:
list = diff –name-status
cam = commit -a -m
cm = commit -m
search = grep –color -n
searchf = grep –name-only
searchn = grep –files-without-match
@vili: Awesome, thanks for sharing!
I see you’ve responded to this already, but you really shouldn’t confuse SHA-1 hashes with GUIDs. GUIDs are a much more general concept, they are basically random numbers. Even when a SHA-1 hash is used to generate a GUID, it it just a mechanism to generate a random number. Identifiers in Git are not random numbers, they are very specific numbers which give Git special properties. For example, if you have a single SHA-1 Git revision number, you can verify with a strong degree of confidence the entire development history of the code base up to that point. That is because the SHA-1 hash corresponds directly to the content of the commit. This deep relationship between content and the id used for that content is baked into the repository format, so that each file will be saved with the same name if it contains the same content.
Also, GUIDs always have hyphens in them, Git ids never do.
The paleo is occasionally mistakenly lumped with raw foods diets.
But where does that leave some of the possible mechanisms for which there is repeated epidemiological evidence associating milk consumption with some cancers – especially Prostate Cancer.