
Traditional version control helps you backup, track and synchronize files. Distributed version control makes it easy to share changes. Done right, you can get the best of both worlds: simple merging and centralized releases.
Distributed? What’s wrong with regular version control?
Nothing — read a visual guide to version control if you want a quick refresher. Sure, some people will deride you for using an “ancient” system. But you’re still OK in my book: using any VCS is a positive step forward for a project.
Centralized VCS emerged from the 1970s, when programmers had thin clients and admired “big iron” mainframes (how can you not like a machine with a then-gluttonous 8 bits to a byte?).
Centralized is simple, and what you’d first invent: a single place everyone can check in and check out. It’s like a library where you get to scribble in the books.
This model works for backup, undo and synchronization but isn’t great for merging and branching changes people make. As projects grow, you want to split features into chunks, developing and testing in isolation and slowly merging changes into the main line. In reality, branching is cumbersome, so new features may come as a giant checkin, making changes difficult to manage and untangle if they go awry.
Sure, merging is always “possible” in a centralized system, but it’s not easy: you often need to track the merge yourself to avoid making the same change twice. Distributed systems make branching and merging painless because they rely on it.
A Few Diagrams, Please
Other tutorials have plenty of nitty-gritty text commands; here’s a visual look. To refresh, developers use a central repo in a typical VCS:

Everyone syncs and checks into the main trunk: Sue adds soup, Joe adds juice, and Eve adds eggs.
Sue’s change must go into main before it can be seen by others. Yes, theoretically Sue could make a new branch for other people to try out her changes, but this is a pain in a regular VCS.
Distributed Version Control Systems (DVCS)
In a distributed model, every developer has their own repo. Sue’s changes live in her local repo, which she can share with Joe or Eve:

But will it be a circus with no ringleader? Nope. If desired, everyone can push changes into a common repo, suspiciously like the centralized model above. This franken-repo contains the changes of Sue, Joe and Eve.
I wish distributed version control had a different name, such as “independent”, “federated” or “peer-to-peer.” The term “distributed” evokes thoughts of distributed computing, where work is split among a grid of machines (like searching for signals with SETI@home or doing protein folding).
A DVCS is not like Seti@home: each node is completely independent and sharing is optional (in Seti you must phone back your results).
Key Concepts In 5 Minutes
Here’s the basics; there’s a book on patch theory if you’re interested.
Core Concepts
- Centralized version control focuses on synchronizing, tracking, and backing up files.
- Distributed version control focuses on sharing changes; every change has a guid or unique id.
- Recording/Downloading and applying a change are separate steps (in a centralized system, they happen together).
- Distributed systems have no forced structure. You can create “centrally administered” locations or keep everyone as peers.
New Terminology
- push: send a change to another repository (may require permission)
- pull: grab a change from a repository
Key Advantages
- Everyone has a local sandbox. You can make changes and roll back, all on your local machine. No more giant checkins; your incremental history is in your repo.
- It works offline. You only need to be online to share changes. Otherwise, you can happily stay on your local machine, checking in and undoing, no matter if the “server” is down or you’re on an airplane.
- It’s fast. Diffs, commits and reverts are all done locally. There’s no shaky network or server to ask for old revisions from a year ago.
- It handles changes well. Distributed version control systems were built around sharing changes. Every change has a guid which makes it easy to track.
- Branching and merging is easy. Because every developer “has their own branch”, every shared change is like reverse integration. But the guids make it easy to automatically combine changes and avoid duplicates.
- Less management. Distributed VCSes are easy to get running; there’s no “always-running” server software to install. Also, DVCSes may not require you to “add” new users; you just pick what URLs to pull from. This can avoid political headaches in large projects.
Key Disadvantages
- You still need a backup. Some claim your “backup” is the other machines that have your changes. I don’t buy it — what if they didn’t accept them all? What if they’re offline and you have new changes? With a DVCS, you still want a machine to push changes to “just in case”. (In Subversion, you usually dedicate a machine to store the main repo; do the same for a DVCS).
- There’s not really a “latest version”. If there’s no central location, you don’t immediately know whether to see Sue, Joe or Eve for the latest version. Again, a central location helps clarify what the latest “stable” release is.
- There aren’t really revision numbers. Every repo has its own revision numbers depending on the changes. Instead, people refer to change numbers: Pardon me, do you have change fa33e7b? (Remember, the id is an ugly guid). Thankfully, you can tag releases with meaningful names.
Mercurial Quickstart
Mercurial is a fast, simple DVCS. The nickname is hg, like the element Mercury.
cd project
hg init (create repo here)
hg add list.txt (start tracking file)
hg commit -m "Added file" (check file into local repo)
hg log (see history; notice guid)
changeset: 0:55bbcb7a4c24
user: Kalid@kazad-laptop
date: Sun Oct 14 21:36:18 2007 -0400
summary: Added file
[edit file]
hg revert list.txt (revert to previous version)
hg tag v1.0 (tag this version)
[edit file]
hg update -C v1.0 ("update" to the older tagged version; -C forces overwrite of local copy)
Once Mercurial has initialized a directory, it looks like this:

You have:
- A working copy. The files you are currently editing.
- A repository. A directory (.hg in Mercurial) containing all patches and metadata (comments, guids, dates, etc.). There’s no central server so the data stays with you.
In our distributed example, Sue, Joe and Eve have their own repos, with independent revision histories.
Understanding Updates and Merging
There’s a few items that confused me when learning about DVCS. First, updates happen in several steps:
- Getting the change into a repo (pushing or pulling)
- Applying the change to the files (update or merge)
- Saving the new version (commit)
Second, depending on the change, you can update or merge:
- Updates happen when there’s no ambiguity. For example, I pull changes to a file that only you’ve been editing. The file just jumps to the latest revision, since there’s no overlapping changes.
- Merges are needed when we have conflicting changes. If we both edit a file, we end up with two “branches” (i.e. alternate universes). One world has my changes, the other world has yours. In this case we (probably) want to merge the changes together into a single universe.
I’m still wrapping my head around how easily branches spring up and collapse in a DVCS:

In this case, a merge is needed because (+Soup) and (+Juice) are changes to a common parent: the list with just “Milk”. After Joe merges the files, Sue can do a regular “pull and update” to get the combined file from Joe. She doesn’t have to merge again on her own.
In Mercurial you can run:
hg incoming ../another-dir (see pending changes)
hg pull ../another-dir (download changes)
hg update (actually apply changes...)
hg merge (... or merge if needed)
hg commit (check in merged file; unite branches)
Yep, the “pull-merge-commit” cycle is long. Luckily, Mercurial has shortcuts to combine commands into a single one. Though it seems complex, it’s much easier than handling merges manually in Subversion.
Most merges are automatic. When conflicts come up, they are typically resolved quickly. Mercurial keeps track of the parent/child relationship for every change (our merged list has two parents), as well as the “heads” or latest changes in each branch. Before the merge we have two heads; afterwards, one.
Organizing a Distributed Project
Here’s one way to organize a distributed project:

Sue, Joe and Eve check changes into a common branch. They can trade patches with each other to do simple “buddy builds”: Hey buddy, can you try out these patches? I need to see if it works before I push to the experimental branch.
Later, a maintainer can review and pull changes from the experimental branch into a stable branch, which has the latest release. A distributed VCS helps isolate changes but still provide the “single source” of a centralized system. There are many models of development, from “pull only” (where maintainers decide what to take, and is used when developing Linux) to “shared push” (which acts like a centralized system). A distributed VCS gives you flexibility in how a project is maintained.
Practice And Scathing Ridicule Makes Perfect
I’m a DVCS newbie, but am happy with what I’ve learned so far. I enjoy SVN, but it’s “fun” seeing how easy a merge can be. My suggestion is to start with Subversion, get a grasp for team collaboration, then experiment with a distributed model. With the proper layout a DVCS can do anything a centralized system can, with the added benefit of easy merging.
Online Resources
- Mercurial has an excellent book. On Windows you may need diffing/merging software or TortoiseMerge (if you have TortoiseSVN installed).
- Darcs has a detailed wikibook (has some math theory about changes).
- Git was created by Linus Torvalds. Here’s an interesting lecture on DVCS; prepare to be berated for using a centralized system:
Notable Quotes:
- “How many have done a branch and merged it? How many of you enjoyed it?”
- “When you do a merge, you plan ahead for a week, then set aside a day to do it.”
- “Some people have 5, 10, 15 branches”. One branch is experimental. One branch is maintenance, etc.
- “CVS — you don’t commit. You make changes without committing. You never commit until it passes a giant test suite. People make 1-liner changes, knowing it can’t possibly break.”
So good luck, and watch out for the holy wars. Feel free to share any tips or suggestions below.

Leave a Reply
166 Comments on "Intro to Distributed Version Control (Illustrated)"
[…] In my quest to try and learn new things (more like trying to avoid being bored with stuff I’m doing) I decided to take a look at another version control system I had heard about called Git. According to it’s entry over at Wikipedia it is “..a distributed revision control / software configuration management project created by Linus Torvalds.” Yes, the same Linus Torvalds who created Linux. When I first heard about Git I thought to myself “why should I care that Subversion supposedly sucks and that distributed version control is awesome?”. So I decided to dig a little further. […]
[…] CT wrote an interesting post today onHere’s a quick excerptMy suggestion is to start with Subversion, get a grasp for team collaboration, then experiment with a distributed model. With the proper layout a DVCS can do anything a centralized system can, with the added benefit of easy merging. … […]
[…] Update: the following appeared a few days after my talk, it is very good, aside from slightly bogus listings in the “disadvantages” section: http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/ | […]
Your git link goes to darcs.net instead of git.or.cz
In any case, that was a good article. I’ll give it to my friend who I have thus far been unable to articulate the benefits of dvcs and git to.
Whoops, thanks for the tip! Glad you liked the article :)
[…] I need to read this primer on Distributed Version Control a couple more times to be sure I fully understand it but at first glance it seems to be an extreme very of a developer isolation model. We attempted something like this on a past project and it failed miserably (mostly due to our inability to force timely sharing of changes… developers can be stubborn). […]
[…] Intro to Distributed Version Control (Illustrated) | BetterExplained (tags: p2p programming management) […]
That’s for the article. A more detailed analysis of the advantages of DVCS can be found here: http://ianclatworthy.files.wordpress.com/2007/10/dvcs-why-and-how1.pdf
Hi Ian, thanks for the info. I took a look and the article seems very informative :).
[…] http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/ […]
Thakns for the guide. I’ve been putting off developing something very similar to this for a couple of months and now I don’t need to.
What did you use for your diagrams? They look cool :)
Hi ct, glad I was able to help. I made the diagrams using PowerPoint 2007 — there’s more details in the comments for this article:
http://betterexplained.com/articles/a-visual-guide-to-version-control/
[…] Intro to Distributed Version Control (Illustrated) (tags: scm mercurial programming) […]
But isnt this in clearcase ucm / multisite for years? each developer has his own developer streams on which he can work independantly and there is the integration stream. each project has an integration stream and all streams can deliver or mergecopy to each other. on each stream there can be multiple views.
Hi Cogmios, thanks for the info. I’ve never used Clearcase, but from Wikipedia it appears as both client/server and distributed, which is pretty interesting (along with its view model). Nope, these ideas are not brand new — just wanted to explain them :)
[…] Intro to Distributed Version Control (Illustrated) — все, что вы хотели узнать о распределенных системах управления версиями […]
Clearcase UCM is not a distributed version control system, and neither is Clearcase multi-site.
Clearcase UCM is hand holding of branch creation and merging of changes between streams, ticket tracking integration etc. It’s the Higher-Order Perl of Clearcase. Underneath it’s just a centralised VCS with branches.
Clearcase multi-site is form of VCS replication. Multiple sites can be defined and a replication schedule defined. Each branch is mastered in one site, i.e. writable in one place, and a read-only copy to replicated to the other sites. Mastering can be changed. And one can merge from a read only (remote) branch to local branch with no problems. The merge tracking is good, if slow, but does have some bugs that can surface if you have a slow network.
So ClearCase multi-site is kind of like a distributed VCS, however only the centralised ‘sites’ are the nodes of the web, unlike the contemporary distributed VCS’s in which every machine or even working copy is a node in the web.
Hi David, thanks for the details! From your description, it seems like ClearCase is somewhere between a regular CVS and DVCS, by allowing to merge from multiple sites (even if not every working copy is independent). Appreciate the info.
Awesome. Thanks. You could get a job as a professional communicator.
Explaining the concepts with those nice visual graphics is great. Pretty hard to explain the details of configuring Eclipse that way, though.
Thanks for the comment — maybe I’ll try getting into that eventually :). Yep, I agree you’ll always need the manual for certain things (but nearly anything can be more clear with a simple diagram!).
[…] Visual guide to distributed version control Filed under: code — Ralph Hockens @ 11:16 pm Intro to Distributed Version Control (Illustrated) […]
[…] O paper de Ian Clatworthy, desenvolvedor do Bazaar, sobre sistemas de controle de versão distribuídos procura demonstrar as vantagens de se adotar este modelo e dá sugestões sobre como escolher uma ferramenta, dentre as opções existentes. Há também, uma explicação sobre DVCS no site BetterExplained, versão ilustrada. […]
[…] Intro to Distributed Version Control (Illustrated) at betterexplained.com. […]
[…] Intro to Distributed Version Control (Illustrated) […]
Пример ведения IT блога…
Я редко выделяю какой-то отдельный англоязычный блог, но вот этот, BetterExplained , заслуживает внимания….