<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BetterExplained &#187; Programming</title>
	<atom:link href="http://betterexplained.com/articles/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://betterexplained.com</link>
	<description>Learning shouldn&#039;t hurt. Let&#039;s share the insights that made difficult ideas click.</description>
	<lastBuildDate>Fri, 23 Jul 2010 17:45:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Aha! Moments When Learning Git</title>
		<link>http://betterexplained.com/articles/aha-moments-when-learning-git/</link>
		<comments>http://betterexplained.com/articles/aha-moments-when-learning-git/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 16:00:45 +0000</pubDate>
		<dc:creator>Kalid</dc:creator>
				<category><![CDATA[Guides]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[version control]]></category>

		<guid isPermaLink="false">http://betterexplained.com/?p=603</guid>
		<description><![CDATA[Git is a fast, flexible but challenging distributed version control system. Before jumping in:

    Understand regular version control
    Understand distributed version control

Along with a book, tutorial and cheatsheet, here are the insights that helped git click.
There&#8217;s a staging area!
Git has a staging area. Git has a staging area!!!
Yowza, did [...]]]></description>
			<content:encoded><![CDATA[<p>Git is a fast, flexible but challenging distributed version control system. Before jumping in:</p>
<ul>
    <li><a class=" external" title="http://betterexplained.com/articles/a-visual-guide-to-version-control/" href="http://betterexplained.com/articles/a-visual-guide-to-version-control/">Understand regular version control</a></li>
    <li><a class=" external" title="http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/" href="http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/">Understand distributed version control</a></li>
</ul>
<p>Along with a <a class=" external" title="http://progit.org/book/" href="http://progit.org/book/">book</a>, <a class=" external" title="http://www.eecs.harvard.edu/~cduan/technical/git/" href="http://www.eecs.harvard.edu/~cduan/technical/git/">tutorial</a> and <a class=" external" title="http://jonas.nitro.dk/git/quick-reference.html" href="http://jonas.nitro.dk/git/quick-reference.html">cheatsheet</a>, here are the insights that helped git click.</p>
<h2>There&#8217;s a staging area!</h2>
<p>Git has a staging area. <strong>Git has a staging area!!!</strong></p>
<p>Yowza, did this ever confuse me. There&#8217;s both a repo (&#8221;object database&#8221;) and a staging area (called &#8220;index&#8221;). Checkins have two steps:</p>
<ul>
    <li><code>git add foo.txt</code>
    <ul>
        <li>Add foo.txt to the index. It&#8217;s not checked in yet!</li>
    </ul>
    </li>
    <li><code>git commit -m "message"</code>
    <ul>
        <li>Put staged files in the repo; they&#8217;re now tracked</li>
        <li>You can &#8220;<code>git add --update"</code> to stage all tracked, modified files</li>
    </ul>
    </li>
</ul>
<p><strong>Why stage?</strong> Git&#8217;s flexible: if a, b and c are changed, you can commit them separately or together.</p>
<p>But now there&#8217;s two undos:</p>
<ul>
    <li><code>git checkout foo.txt</code>
    <ul>
        <li>Undo local changes (like svn revert)</li>
    </ul>
    </li>
    <li><code>git reset HEAD foo.txt</code>
    <ul>
        <li>Remove from staging area (local copy still modified).</li>
    </ul>
    </li>
</ul>
<p>Add and commit, add and commit &#8212; Git has a rhythm.</p>
<h2>Branching is &#8220;Save as&#8230;&#8221;</h2>
<p >Branches are like &#8220;Save as&#8230;&#8221; on a directory. Best of all:</p>
<ul >
    <li >Easily merge changes with the original (changes tracked and never applied twice)</li>
    <li >No wasted space (common files only stored once)</li>
</ul>
<p ><strong>Why branch?</strong> Consider the utility of &#8220;Save as&#8230;&#8221; for regular files: you tinker with multiple possibilities while keeping the original safe. Git enables this for directories, with the power to merge. (In practice, svn is like a single shared drive, where you can only revert to one backup).</p>
<h2>Imagine virtual directories</h2>
<p>I see branches as &#8220;virtual directories&#8221; in the .git folder. While inside a physical directory (c:\project or ~/project), you traverse virtual directories with a checkout.</p>
<ul>
    <li><code>git checkout master</code>
    <ul>
        <li>switch to master branch (&#8221;cd master&#8221;)</li>
    </ul>
    </li>
    <li><code>git branch dev</code>
    <ul>
        <li>create new branch from existing (&#8221;cp * dev&#8221;)</li>
        <li>you still need to &#8220;cd&#8221; with &#8220;git checkout dev&#8221;</li>
    </ul>
    </li>
    <li><code>git merge dev</code>
    <ul>
        <li>(when in master) pull in changes from dev (&#8221;cp dev/* .&#8221;)</li>
    </ul>
    </li>
    <li><code>git branch</code>
    <ul>
        <li>list all branches (&#8221;ls&#8221;)</li>
    </ul>
    </li>
</ul>
<p>My inner dialogue is &#8220;change to dev directory (checkout)&#8230; make changes&#8230; save changes (add/commit)&#8230; change to master directory&#8230; copy in changes from dev (merge)&#8221;.</p>
<p>The physical directory is a scratchpad. Virtual directories are affected by git commands:</p>
<ul>
    <li><code>rm foo.txt</code>
    <ul>
        <li>Remove foo.txt from your sandbox (restored if you checkout the branch again)</li>
    </ul>
    </li>
    <li><code>git rm foo.txt</code>
    <ul>
        <li>Remove foo.txt from current virtual directory</li>
        <li>Gotcha: you need to commit that change!</li>
    </ul>
    </li>
</ul>
<h2>Know the current branch</h2>
<p>Just like seeing your current directory, <strong>put the current branch in your prompt!</strong></p>
<p><img alt="" src="http://betterexplained.com/wp-content/uploads/git/git_prompt.png" /></p>
<p>In my .bash_profile (modified from <a class=" external" href="http://asemanfar.com/Current-Git-Branch-in-Bash-Prompt">here</a>):</p>


<pre>
<code>

parse_git_branch() {
    git branch 2&gt; /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}

export PS1="\[\033[00m\]\u@\h\[\033[01;34m\] \W \[\033[31m\]\$(parse_git_branch) \[\033[00m\]$\[\033[00m\] "

</code>
</pre>


<h2>Visualize your branch structure</h2>
<p>Git leaves branch organization to you. Nvie.com has a <a title="http://nvie.com/git-model" class=" external" href="http://nvie.com/git-model">great branch strategy</a>:</p>
<p><img alt="" src="http://betterexplained.com/wp-content/uploads/git/git_branch_strategy.png" /></p>
<ul>
    <li>Have a mainline (master). Mentally it&#8217;s on the far right.</li>
    <li>Create branches (master -&gt; dev) and subbranches (dev -&gt; featureX). The further from master, the crazier.</li>
    <li>Only merge with neighbors (master -&gt; dev -&gt; feature X, or featureX -&gt; dev -&gt; master)</li>
</ul>
<p>Stay sane by choosing a branch layout up front. I have a master tracking a svn project, and dev for my own code. In general, master is clean so I can branch anytime for one-off fixes.</p>
<h2>Understand local vs. remote</h2>
<p>Git has local and remote commands; seeing both confused me (&#8221;When do you checkout vs. pull?&#8221;). Work locally, syncing remotely as needed.</p>
<p><strong>Local data</strong></p>
<ul>
    <li><code>git init</code><br />
    <ul>
        <li>create local repo</li>
        <li>use git add/commit/branch to work locally</li>
    </ul>
    </li>
</ul>
<p><strong>Remote data</strong></p>
<ul>
    <li><code>git remote add name path-to-repo</code>
    <ul>
        <li>track a remote repo (usually &#8220;origin&#8221;) from an existing repo</li>
        <li>remote branches are &#8220;origin/master&#8221;, &#8220;origin/dev&#8221; etc.<strong><br />
        </strong></li>
    </ul>
    </li>
    <li><code>git branch -a</code>
    <ul>
        <li>list all branches (remote and local)</li>
    </ul>
    </li>
</ul>
<ul>
    <li><code>git clone path-to-repo</code>
    <ul>
        <li>create a new local git repo copied from a remote one</li>
        <li>local master tracks remote master</li>
    </ul>
    </li>
    <li><code>git pull </code>
    <ul>
        <li>merge changes from tracked remote branch (if in dev, pull from origin/dev)</li>
    </ul>
    </li>
    <li><code>git push</code>
    <ul>
        <li>send changes to tracked remote branch (if in dev, push to origin/dev)</li>
    </ul>
    </li>
</ul>
<p><strong>Why local and remote?</strong> Subversion has central checkins, so you avoid committing unfinished work. With git, local commits are frequent and you only push when ready.</p>
<h2><span class="caps">GUID</span>s are <span class="caps">GOOD</span></h2>
<p>Git addresses information by a hash (<a title="http://betterexplained.com/articles/the-quick-guide-to-guids/" class=" external" href="http://betterexplained.com/articles/the-quick-guide-to-guids/"><span class="caps">GUID</span></a>) of its contents. If two branches are the same, they have the same <span class="caps">GUID </span>(and vice versa).</p>
<p>Why&#8217;s this cool? We can create branches independently, merge them, and have a common <span class="caps">GUID.</span> No central numbering needed. Usually, we just compare the first few digits: &#8220;Are you on a93?&#8221;.</p>
<h2>Tips &amp; Tricks</h2>
<p>For your .gitconfig:</p>


<pre>
<code>
[alias]
        ci = commit
        st = status
        co = checkout
        oneline = log --pretty=oneline
        br = branch
        la = log --pretty=\"format:%ad %h (%an): %s\" --date=short
</code>
</pre>


<p>There are some <span class="caps">GUI </span>tools for git, but I prefer to learn via the command line.&nbsp;Git is opinionated software (which I like), and analogies help me understand its world view.</p>]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/aha-moments-when-learning-git/feed/</wfw:commentRss>
		<slash:comments>40</slash:comments>
		</item>
		<item>
		<title>A Simple Introduction To Computer Networking</title>
		<link>http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/</link>
		<comments>http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 16:00:30 +0000</pubDate>
		<dc:creator>Kalid</dc:creator>
				<category><![CDATA[Guides]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/</guid>
		<description><![CDATA[Most networking discussions are a jumble of acronyms. Forget the configuration details -- what are the insights?

Networking is about communication
Text is the simplest way to communicate
Protocols are standards for reading and writing text

Beneath the details, networking is an IM conversation. Here's what I wish someone told me when learning how computers communicate.
TCP: The Text Layer

The [...]]]></description>
			<content:encoded><![CDATA[<p>Most networking discussions are a jumble of acronyms. Forget the configuration details -- what are the insights?
<ul>
<li><strong>Networking is about communication</strong>
<li><strong>Text is the simplest way to communicate</strong>
<li><strong>Protocols are standards for reading and writing text</strong></li>
</ul>
<p>Beneath the details, networking is an IM conversation. Here's what I wish someone told me when learning how computers communicate.<br />
<h2>TCP: The Text Layer</h2>
<h2></h2>
<p>The Transmission Control Protocol (TCP) provides the handy illusion that we can "just" send text between two computers. TCP relies on <a href="http://en.wikipedia.org/wiki/Internet_Protocol">lower levels</a> and can send binary data, but ignore that for now:
<ul>
<li><strong>TCP lets us Instant Message between computers</strong></li>
</ul>
<p>We IM with Telnet, the 'notepad' of networking: telnet sends and receives plain text using TCP. It's a chat client peacefully free of ads and unsolicited buddy requests.</p>
<p>Let's talk to Google using <a href="http://support.microsoft.com/kb/279466">telnet</a> (or <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/">putty</a>, a better utility):</p>
<pre>telnet google.com 80
[connecting...]
Hello Mr. Google!
</pre>
</p>
<p>We connect to google.com on port 80 (the default for web requests) and send the message "Hello Mr. Google!". We press Enter a few times and await the reply: </p>
<pre>&lt;html&gt;
...
&lt;h1&gt;Bad Request&lt;/h1&gt;
Your client has issued a malformed or illegal request
...
&lt;/html&gt;</pre>
<p>Malformed? Illegal? <em>The mighty Google is not pleased</em>. It didn't understand us and sent HTML telling the same. </p>
<p>But, we had a conversation: text went in, and text came back. In other words:&nbsp;
<p><img src="http://betterexplained.com/wp-content/uploads/networking/tcp_chat.png">&nbsp;<br />
<h2>Protocols: The Forms To Fill Out </h2>
<p>Unstructured chats is too carefree -- how does the server know what we want to do? We need a <em>protocol</em> (standard way of communicating) if we're going to make sense. </p>
<p>We use protocols all the time </p>
<ul>
<li>Putting to: and from: addresses in certain places on an envelope
<li>Filling out bank forms (special place for account number, deposit amount, etc.)
<li>Saying "Roger" or "10-4" to indicate a radio request was understood</li>
</ul>
<p>Protocols make communication clear. </p>
<h2>Case Study: The HTTP Protocol</h2>
<p>We see HTTP in every url: <a href="http://google.com/">http://google.com/</a>. What does it mean? </p>
<ul>
<li>Connect to server google.com (Using TCP, port 80 by default)
<li>Ask for the resource "/" (the default resource)
<li>Format the request using the Hypertext Transport Protocol</li>
</ul>
<p>HTTP is the "form to fill out" when asking for the resource. Using the HTTP format, the above request looks like this: </p>
<pre>GET / HTTP/1.0</pre>
<p>Remember, <em>it's just text</em>! We're asking for a file, through an IM session, using the format: [Command] [Resource] [Protocol Name/Version]. </p>
<p>This command is "IM'd" to the server (your browser adds extra info, a detail for another time). Google's server returns this response: </p>
<pre>HTTP/1.0 200 OK
Cache-Control: private, max-age=0
Date: Sun, 15 Mar 2009 03:13:39 GMT
Expires: -1
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=5cc6...
Server: gws
Connection: Close

&lt;html&gt;
(Google web page, search box, and cute logo)
&lt;/html&gt;
</pre>
<p>Yowza. The bottom part is HTML for the browser to display. But why the junk up top? </p>
<p>Well, suppose we just got the raw HTML to display. But what about errors: if the server crashed, the file wasn't there, or google just didn't like us? </p>
<p>Some <em>metadata</em> (data about data) is useful. When we order a book from Amazon <strong>we expect a packing slip</strong> describing the order: the intended recipient, price, return information, etc. You don't want a naked book just thrown on your doorstep. </p>
<p>Protocols are similar: the recipient wants to know if everything was OK. Here we see infamous status codes like 404 (resource not found) or 200 (everything OK). These headers aren't the real data -- they're the packing slip from the server. </p>
<h2>Insights From Protocols</h2>
<p>Studying existing, popular systems is a great way to understand engineering decisions. Here are a few: </p>
<p><strong>Binary vs Plain Text</p>
<p></strong><a href="http://betterexplained.com/articles/a-little-diddy-about-binary-file-formats/">Binary data</a> is more efficient than text, but more difficult to debug and generate (how many hex editors do you know to use?). Lower-level protocols, the backbone of the internet, use binary data to maintain performance. Application-level protocols (HTTP and above) use text data for ease of interoperability. You don't have religious wars about endian issues with HTTP. </p>
<p><strong>Stateful vs. Stateless </strong></p>
<p>Some protocols are stateful, which means the server remembers the chat with the client. With SMTP, for example, the client opens a connection and issues commands one at a time (such as adding recipients to an email), and closes the connection. Stateful communication is useful in transactions that have many steps or conditions.</p>
<p>Stateless communication is simpler: you send the entire transaction as one request. Each "instant message" stands on its own and doesn't need the others. HTTP is stateless: you can request a webpage without introducing yourself to the server.</p>
<p><strong>Extensibility</strong></p>
<p>We can't think of everything beforehand. How do we extend old protocols for new users?</p>
<p>HTTP has a simple and effective "header" structure: a metadata preamble that looks like "Header:Value".</p>
<p>If you don't recognize the header sent (new client, old server) just ignore it. If you were expecting a header but don't see it (old client, new server), just use a default. It's like having an "Anything else to tell us?" section in a survey.</p>
<p><strong>Error Correction &amp; Reliability</strong></p>
<p>It's the job of lower-level protocols like TCP to make sure data is transmitted reliably. But higher-level protocols (like HTTP) need to make sure it's the <em>right</em> data. How are errors handled and communicated? Can the client just retry or does the server need to reset state?</p>
<p>HTTP comes with its own set of error codes to handle a variety of situations.</p>
<p><strong>Availability</strong></p>
<p>The neat thing about networking is that works on one computer. Memcached is a great service to cache data. And guess what? It uses plain-old text commands (over TCP) to save and retrieve data.</p>
<p>You don't need complex COM objects or DLLs - you start a Memcached server, send text in, and get text out. It's language-neutral and easy to access because any decent OS supports networking. You can even telnet into Memcached to debug it. </p>
<p>Wireless routers are similar: they have a control panel available through HTTP. There's no "router configuration program" -- you just connect to it with your browser. The router serves up webpages, and when you submit data it makes the necessary configuration changes. </p>
<p>Protocols like HTTP are so popular you can <em>assume</em> the user has a client.</p>
<p><strong>Layering Protocols</strong> </p>
<p>Protocols can be layered. We might write a resume, which is part of a larger application, which is stuffed into an envelope. Each segment has its own format, blissfully unaware of the others. Your envelope doesn't care about the resume -- it just wants the to: and from: addresses written correctly.</p>
<p>Many protocols rely on HTTP because it's so widely used (rather than starting from scratch, like Memcached, which needs efficiency). HTTP has well-understood methods to define resources (URLs) and commands (GET and POST), so why not use them?</p>
<p>Web services do just that. The SOAP protocol crams XML inside of HTTP commands. The REST protocol embraces HTTP and uses the existing verbs as much as possible.</p>
<h2>Remember: It's All Made Up </h2>
<p>Networking involves <em>human conventions</em>. Because plain text is ubiquitous and easy to use, it is the basis for most protocols. And TCP is the simplest, most-supported way to exchange text.</p>
<p><strong>Remembering that everything is a plain text IM conversation</strong> helps me wrap my head around the inevitable networking issues. And sometimes you need to jump into HTTP to understand <a href="http://betterexplained.com/articles/how-to-optimize-your-site-with-gzip-compression/">compression</a> and <a href="http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/">caching</a>.</p>
<p>Don't just memorize the details; see protocols as strategies to solve communication problems. Happy networking.</p>
]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Intro to Distributed Version Control (Illustrated)</title>
		<link>http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/</link>
		<comments>http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/#comments</comments>
		<pubDate>Mon, 15 Oct 2007 07:00:39 +0000</pubDate>
		<dc:creator>Kalid</dc:creator>
				<category><![CDATA[Guides]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://betterexplained.com/articles/a-visual-look-at-distributed-version-control/</guid>
		<description><![CDATA[

Traditional version control helps you backup, track and synchronize files. Distributed version control makes it easy to share changes. Done right, you can get the best of both worlds: simple merging and centralized releases.

Distributed? What&#8217;s wrong with regular version control?

Nothing &#8212; read a visual guide to version control if you want a quick refresher. Sure, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_logo.png" alt="" /></p>

<p>Traditional version control helps you backup, track and synchronize files. Distributed version control makes it easy to share changes. Done right, you can get the best of both worlds: simple merging and centralized releases.</p>

<h2>Distributed? What&#8217;s wrong with regular version control?</h2>

<p>Nothing &#8212; read <a href="http://betterexplained.com/articles/a-visual-guide-to-version-control/">a visual guide to version control</a> if you want a quick refresher. Sure, <em>some people</em> will deride you for using an &#8220;ancient&#8221; system. But you&#8217;re still OK in my book: using <em>any</em> <span class="caps">VCS </span>is a positive step forward for a project.</p>

<p>Centralized <span class="caps">VCS </span>emerged from the 1970s, when programmers had thin clients and admired &#8220;big iron&#8221; mainframes (how can you <strong>not</strong> like a machine with a then-gluttonous <a href="http://en.wikipedia.org/wiki/System/360">8 bits to a byte</a>?).</p>

<p><strong>Centralized is simple</strong>, and what you&#8217;d first invent: a single place everyone can check in and check out. It&#8217;s like a library where you get to scribble in the books.</p>

<p>This model works for <strong>backup, undo and synchronization</strong> but isn&#8217;t great for <strong>merging and branching</strong> changes people make. As projects grow, you want to split features into chunks, developing and testing in isolation and slowly merging changes into the main line. In reality, branching is cumbersome, so new features may come as a giant checkin, making changes difficult to manage and untangle if they go awry.</p>

<p>Sure, merging is always &#8220;possible&#8221; in a centralized system, but it&#8217;s not easy: you often need to track the merge yourself to avoid making the same change twice. Distributed systems make branching and merging painless because they rely on it.</p>

<div class="textad sponsored_footer" style="display:none;"><span class="meta">Advertisement:</span> Use <a href="http://springloops.com" rel="nofollow" onClick="javascript:urchinTracker('/ads/springloops.com');">Springloops</a> for your secure and rapid web deployment. Easily share and deploy your code. Free &#038; paid plans and no contracts: <a href="http://springloops.com/signup" rel="nofollow" onClick="javascript:urchinTracker('/ads/springloops.com');">Try it today</a>!
</div>

<h2>A Few Diagrams, Please</h2>

<p>Other tutorials have plenty of nitty-gritty text commands; here&#8217;s a <strong>visual</strong> look. To refresh, developers use a central repo in a typical <span class="caps">VCS</span>:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/centralized_example.png" alt="" /></p>

<p>Everyone syncs and checks into the main trunk: Sue adds soup, Joe adds juice, and Eve adds eggs. </p>

<p>Sue&#8217;s change must go into main before it can be seen by others. Yes, theoretically Sue <em>could</em> make a new branch for other people to try out her changes, but this is a pain in a regular <span class="caps">VCS.</span></p>

<h2>Distributed Version Control Systems (DVCS)</h2>

<p>In a <strong>distributed</strong> model, every developer has their own repo. Sue&#8217;s changes live in <strong>her local repo</strong>, which she can share with Joe or Eve:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_example.png" alt="" /></p>

<p>But will it be a circus with no ringleader? Nope. If desired, everyone can push changes into a common repo, suspiciously like the centralized model above. This franken-repo contains the changes of Sue, Joe and Eve.</p>

<p><strong>I wish distributed version control had a different name</strong>, such as &#8220;independent&#8221;, &#8220;federated&#8221; or &#8220;peer-to-peer.&#8221; The term &#8220;distributed&#8221; evokes thoughts of distributed computing, where work is split among a grid of machines (like searching for signals with <a href="http://setiathome.berkeley.edu/"><span class="caps">SETI</span>@home</a> or doing <a href="http://folding.stanford.edu/">protein folding</a>).</p>

<p>A <span class="caps">DVCS </span>is not like Seti@home: each node is completely independent and sharing is optional (in Seti you must phone back your results).</p>

<h2>Key Concepts In 5 Minutes</h2>

<p>Here&#8217;s the basics; there&#8217;s a <a href="http://en.wikibooks.org/wiki/Understanding_darcs/Patch_theory">book</a> on patch theory if you&#8217;re interested.</p>

<p><strong>Core Concepts</strong></p>


<ul>
<li>Centralized version control focuses on <strong>synchronizing, tracking, and backing up files.</strong></li>
<li>Distributed version control focuses on <strong>sharing changes</strong>; every change has a <a href="http://betterexplained.com/articles/the-quick-guide-to-guids/">guid or unique id</a>.</li>
<li><strong>Recording/Downloading</strong> and <strong>applying</strong> a change are separate steps (in a centralized system, they happen together).</li>
<li><strong>Distributed systems have no forced structure</strong>. You can create &#8220;centrally administered&#8221; locations or keep everyone as peers.</li>
</ul>



<p><strong>New Terminology</strong></p>


<ul>
<li><strong>push</strong>: send a change to another repository (may require permission)</li>
<li><strong>pull</strong>: grab a change from a repository</li>
</ul>



<p><strong>Key Advantages</strong></p>


<ul>
<li><strong>Everyone has a local sandbox.</strong> You can make changes and roll back, all on your local machine. No more giant checkins; your incremental history is in your repo. </li>
<li><strong>It works offline.</strong> You only need to be online to share changes. Otherwise, you can happily stay on your local machine, checking in and undoing, no matter if the &#8220;server&#8221; is down or you&#8217;re on an airplane.</li>
<li><strong>It&#8217;s fast.</strong> Diffs, commits and reverts are all done locally. There&#8217;s no shaky network or server to ask for old revisions from a year ago.</li>
<li><strong>It handles changes well.</strong> Distributed version control systems were <em>built</em> around sharing changes. Every change has a guid which makes it easy to track.</li>
<li><strong>Branching and merging is easy.</strong> Because every developer &#8220;has their own branch&#8221;, every shared change is like reverse integration. But the guids make it easy to automatically combine changes and avoid duplicates.</li>
<li><strong>Less management.</strong> Distributed <span class="caps">VCS</span>es are easy to get running; there&#8217;s no &#8220;always-running&#8221; server software to install. Also, <span class="caps">DVCS</span>es may not require you to &#8220;add&#8221; new users; you just pick what <span class="caps">URL</span>s to pull from. This can avoid political headaches in large projects.</li>
</ul>



<p><strong>Key Disadvantages</strong></p>


<ul>
<li><strong>You still need a backup.</strong> Some claim your &#8220;backup&#8221; is the other machines that have your changes. I don&#8217;t buy it &#8212; what if they didn&#8217;t accept them all? What if they&#8217;re offline and you have new changes? With a <span class="caps">DVCS, </span>you still want a machine to push changes to &#8220;just in case&#8221;. (In Subversion, you usually dedicate a machine to store the main repo; do the same for a <span class="caps">DVCS</span>).</li>
<li><strong>There&#8217;s not really a &#8220;latest version&#8221;</strong>. If there&#8217;s no central location, you don&#8217;t immediately know whether to see Sue, Joe or Eve for the latest version. Again, a central location helps clarify what the latest &#8220;stable&#8221; release is.</li>
<li><strong>There aren&#8217;t really revision numbers.</strong> Every repo has its own revision numbers depending on the changes. Instead, people refer to change numbers: <em>Pardon me, do you have change fa33e7b?</em> (Remember, the id is an ugly guid). Thankfully, you can tag releases with meaningful names.</li>
</ul>



<h2>Mercurial Quickstart</h2>

<p>Mercurial is a fast, simple <span class="caps">DVCS.</span> The nickname is hg, like the element Mercury.</p>



<pre>
<code>
cd project
hg init                                (create repo here)
hg add list.txt                        (start tracking file)
hg commit -m "Added file"              (check file into local repo)
hg log                                 (see history; notice guid)

changeset:   0:55bbcb7a4c24
user:        Kalid@kazad-laptop
date:        Sun Oct 14 21:36:18 2007 -0400
summary:     Added file

[edit file]
hg revert list.txt                 (revert to previous version)

hg tag v1.0                        (tag this version)
[edit file]
hg update -C v1.0                  ("update" to the older tagged version; -C forces overwrite of local copy)
</code>
</pre>



<p>Once Mercurial has initialized a directory, it looks like this:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_repo_layout.png" alt="" /></p>

<p>You have:</p>


<ul>
<li><strong>A working copy</strong>. The files you are currently editing.</li>
<li><strong>A repository</strong>. A directory (.hg in Mercurial) containing all patches and metadata (comments, guids, dates, etc.). There&#8217;s no central server so the data stays with you.</li>
</ul>



<p>In our distributed example, Sue, Joe and Eve have their own repos, with independent revision histories.</p>

<h2>Understanding Updates and Merging</h2>

<p>There&#8217;s a few items that confused me when learning about <span class="caps">DVCS.</span> First, updates happen in several steps:</p>


<ul>
<li><strong>Getting</strong> the change into a repo (pushing or pulling)</li>
<li><strong>Applying</strong> the change to the files (update or merge)</li>
<li><strong>Saving</strong> the new version (commit)</li>
</ul>



<p>Second, depending on the change, you can update or merge:</p>


<ul>
<li><strong>Updates</strong> happen when there&#8217;s no ambiguity. For example, I pull changes to a file that only you&#8217;ve been editing. The file just jumps to the latest revision, since there&#8217;s no overlapping changes.</li>
<li><strong>Merges</strong> are needed when we have conflicting changes. If we both edit a file, we end up with two &#8220;branches&#8221; (i.e. alternate universes). One world has my changes, the other world has yours. In this case we (probably) want to merge the changes together into a single universe.</li>
</ul>



<p>I&#8217;m still wrapping my head around how easily branches spring up and collapse in a <span class="caps">DVCS</span>:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_merge.png" alt="" /></p>

<p>In this case, a merge is needed because (+Soup) and (+Juice) are changes to a common parent: the list with just &#8220;Milk&#8221;. After Joe merges the files, Sue can do a regular &#8220;pull and update&#8221; to get the combined file from Joe. She doesn&#8217;t have to merge again on her own.</p>

<p>In Mercurial you can run:</p>



<pre>
<code>
hg incoming ../another-dir  (see pending changes)
hg pull ../another-dir      (download changes)

hg update                   (actually apply changes...)
hg merge                    (... or merge if needed)

hg commit                   (check in merged file; unite branches)
</code>
</pre>



<p>Yep, the &#8220;pull-merge-commit&#8221; cycle is long. Luckily, Mercurial has shortcuts to combine commands into a single one. Though it seems complex, it&#8217;s <strong>much</strong> easier than handling merges manually in Subversion.</p>

<p><strong>Most merges are automatic.</strong> When conflicts come up, they are typically resolved quickly. Mercurial keeps track of the parent/child relationship for every change (our merged list has two parents), as well as the &#8220;heads&#8221; or latest changes in each branch. Before the merge we have two heads; afterwards, one.</p>

<h2>Organizing a Distributed Project</h2>

<p>Here&#8217;s one way to organize a distributed project:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_push_pull.png" alt="" /></p>

<p>Sue, Joe and Eve check changes into a common branch. They can trade patches with each other to do simple <strong>&#8220;buddy builds&#8221;</strong>: <em>Hey buddy, can you try out these patches? I need to see if it works before I push to the experimental branch.</em></p>

<p>Later, a maintainer can review and pull changes from the experimental branch into a stable branch, which has the latest release. A distributed <span class="caps">VCS </span>helps isolate changes but still provide the &#8220;single source&#8221; of a centralized system. There are many models of development, from &#8220;pull only&#8221; (where maintainers decide what to take, and is used when developing Linux) to &#8220;shared push&#8221; (which acts like a centralized system). A distributed <span class="caps">VCS </span>gives you <strong>flexibility</strong> in how a project is maintained.</p>

<h2>Practice And Scathing Ridicule Makes Perfect</h2>

<p>I&#8217;m a <span class="caps">DVCS </span>newbie, but am happy with what I&#8217;ve learned so far. I enjoy <span class="caps">SVN, </span>but it&#8217;s &#8220;fun&#8221; seeing how easy a merge can be. My suggestion is to start with Subversion, get a grasp for team collaboration, then experiment with a distributed model. With the proper layout a <span class="caps">DVCS </span>can do anything a centralized system can, with the added benefit of easy merging.</p>

<p><strong>Online Resources</strong></p>


<ul>
<li><a href="http://www.selenic.com/mercurial/wiki/">Mercurial</a> has an <a href="http://hgbook.red-bean.com/hgbook.html">excellent book</a>. On Windows you may need <a href="http://kdiff3.sourceforge.net/">diffing/merging software</a> or <a href="http://tortoisesvn.tigris.org/TortoiseMerge.html">TortoiseMerge</a> (if you have TortoiseSVN installed).</li>
<li><a href="http://darcs.net/">Darcs</a> has a detailed <a href="http://en.wikibooks.org/wiki/Understanding_darcs">wikibook</a> (has some math theory about changes).</li>
<li><a href="http://git.or.cz">Git</a> was created by Linus Torvalds. Here&#8217;s an <a href="http://www.youtube.com/watch?v=4XpnKHJAok8">interesting lecture</a> on <span class="caps">DVCS</span>; prepare to be berated for using a centralized system:</li>
</ul>



<p><object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/4XpnKHJAok8"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/4XpnKHJAok8" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object></p>

<p>Notable Quotes:</p>


<ul>
<li>&#8220;How many have done a branch and merged it? How many of you enjoyed it?&#8221;</li>
<li>&#8220;When you do a merge, you plan ahead for a week, then set aside a day to do it.&#8221;</li>
<li>&#8220;Some people have 5, 10, 15 branches&#8221;. One branch is experimental. One branch is maintenance, etc.</li>
<li>&#8220;CVS &#8212; you don&#8217;t commit. You make changes without committing. You never commit until it passes a giant test suite. People make 1-liner changes, knowing it can&#8217;t <em>possibly</em> break.&#8221;</li>
</ul>



<p>So good luck, and watch out for the holy wars. Feel free to share any tips or suggestions below.</p>]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/feed/</wfw:commentRss>
		<slash:comments>109</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk (enhanced) (user agent is rejected)
Database Caching 8/14 queries in 0.008 seconds using disk

Served from: betterexplained.com @ 2010-07-29 21:34:10 -->