- BetterExplained - https://betterexplained.com -

Intro to Distributed Version Control (Illustrated)

Traditional version control helps you backup, track and synchronize files. Distributed version control makes it easy to share changes. Done right, you can get the best of both worlds: simple merging and centralized releases.

Distributed? What’s wrong with regular version control?

Nothing — read a visual guide to version control if you want a quick refresher. Sure, some people will deride you for using an “ancient” system. But you’re still OK in my book: using any VCS is a positive step forward for a project.

Centralized VCS emerged from the 1970s, when programmers had thin clients and admired “big iron” mainframes (how can you not like a machine with a then-gluttonous 8 bits to a byte?).

Centralized is simple, and what you’d first invent: a single place everyone can check in and check out. It’s like a library where you get to scribble in the books.

This model works for backup, undo and synchronization but isn’t great for merging and branching changes people make. As projects grow, you want to split features into chunks, developing and testing in isolation and slowly merging changes into the main line. In reality, branching is cumbersome, so new features may come as a giant checkin, making changes difficult to manage and untangle if they go awry.

Sure, merging is always “possible” in a centralized system, but it’s not easy: you often need to track the merge yourself to avoid making the same change twice. Distributed systems make branching and merging painless because they rely on it.

A Few Diagrams, Please

Other tutorials have plenty of nitty-gritty text commands; here’s a visual look. To refresh, developers use a central repo in a typical VCS:

centralized version control

Everyone syncs and checks into the main trunk: Sue adds soup, Joe adds juice, and Eve adds eggs.

Sue’s change must go into main before it can be seen by others. Yes, theoretically Sue could make a new branch for other people to try out her changes, but this is a pain in a regular VCS.

Distributed Version Control Systems (DVCS)

In a distributed model, every developer has their own repo. Sue’s changes live in her local repo, which she can share with Joe or Eve:

distributed version control example

But will it be a circus with no ringleader? Nope. If desired, everyone can push changes into a common repo, suspiciously like the centralized model above. This franken-repo contains the changes of Sue, Joe and Eve.

I wish distributed version control had a different name, such as “independent”, “federated” or “peer-to-peer.” The term “distributed” evokes thoughts of distributed computing, where work is split among a grid of machines (like searching for signals with SETI@home or doing protein folding).

A DVCS is not like Seti@home: each node is completely independent and sharing is optional (in Seti you must phone back your results).

Key Concepts In 5 Minutes

Here’s the basics; there’s a book on patch theory if you’re interested.

Core Concepts

New Terminology

Key Advantages

Key Disadvantages

Mercurial Quickstart

Mercurial is a fast, simple DVCS. The nickname is hg, like the element Mercury.

cd project
hg init                                (create repo here)
hg add list.txt                        (start tracking file)
hg commit -m "Added file"              (check file into local repo)
hg log                                 (see history; notice guid)

changeset:   0:55bbcb7a4c24
user:        Kalid@kazad-laptop
date:        Sun Oct 14 21:36:18 2007 -0400
summary:     Added file

[edit file]
hg revert list.txt                 (revert to previous version)

hg tag v1.0                        (tag this version)
[edit file]
hg update -C v1.0                  ("update" to the older tagged version; -C forces overwrite of local copy)

Once Mercurial has initialized a directory, it looks like this:

distributed version control repo layout

You have:

In our distributed example, Sue, Joe and Eve have their own repos, with independent revision histories.

Understanding Updates and Merging

There’s a few items that confused me when learning about DVCS. First, updates happen in several steps:

Second, depending on the change, you can update or merge:

I’m still wrapping my head around how easily branches spring up and collapse in a DVCS:

distributed version control merge changes

In this case, a merge is needed because (+Soup) and (+Juice) are changes to a common parent: the list with just “Milk”. After Joe merges the files, Sue can do a regular “pull and update” to get the combined file from Joe. She doesn’t have to merge again on her own.

In Mercurial you can run:

hg incoming ../another-dir  (see pending changes)
hg pull ../another-dir      (download changes)

hg update                   (actually apply changes...)
hg merge                    (... or merge if needed)

hg commit                   (check in merged file; unite branches)

Yep, the “pull-merge-commit” cycle is long. Luckily, Mercurial has shortcuts to combine commands into a single one. Though it seems complex, it’s much easier than handling merges manually in Subversion.

Most merges are automatic. When conflicts come up, they are typically resolved quickly. Mercurial keeps track of the parent/child relationship for every change (our merged list has two parents), as well as the “heads” or latest changes in each branch. Before the merge we have two heads; afterwards, one.

Organizing a Distributed Project

Here’s one way to organize a distributed project:

distributed version control push pull model

Sue, Joe and Eve check changes into a common branch. They can trade patches with each other to do simple “buddy builds”: Hey buddy, can you try out these patches? I need to see if it works before I push to the experimental branch.

Later, a maintainer can review and pull changes from the experimental branch into a stable branch, which has the latest release. A distributed VCS helps isolate changes but still provide the “single source” of a centralized system. There are many models of development, from “pull only” (where maintainers decide what to take, and is used when developing Linux) to “shared push” (which acts like a centralized system). A distributed VCS gives you flexibility in how a project is maintained.

Practice And Scathing Ridicule Makes Perfect

I’m a DVCS newbie, but am happy with what I’ve learned so far. I enjoy SVN, but it’s “fun” seeing how easy a merge can be. My suggestion is to start with Subversion, get a grasp for team collaboration, then experiment with a distributed model. With the proper layout a DVCS can do anything a centralized system can, with the added benefit of easy merging.

Online Resources

Notable Quotes:

So good luck, and watch out for the holy wars. Feel free to share any tips or suggestions below.

Other Posts In This Series

  1. A Visual Guide to Version Control
  2. Intro to Distributed Version Control (Illustrated)
  3. Aha! Moments When Learning Git