<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BetterExplained &#187; Programming</title>
	<atom:link href="http://betterexplained.com/articles/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://betterexplained.com</link>
	<description>Learning shouldn't hurt. Let's share the insights that made difficult ideas click.</description>
	<lastBuildDate>Wed, 18 Nov 2009 02:52:35 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A Simple Introduction To Computer Networking</title>
		<link>http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/</link>
		<comments>http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 16:00:30 +0000</pubDate>
		<dc:creator>Kalid</dc:creator>
				<category><![CDATA[Guides]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/</guid>
		<description><![CDATA[Most networking discussions are a jumble of acronyms. Forget the configuration details -- what are the insights?

Networking is about communication
Text is the simplest way to communicate
Protocols are standards for reading and writing text

Beneath the details, networking is an IM conversation. Here's what I wish someone told me when learning how computers communicate.
TCP: The Text Layer

The [...]]]></description>
			<content:encoded><![CDATA[<p>Most networking discussions are a jumble of acronyms. Forget the configuration details -- what are the insights?
<ul>
<li><strong>Networking is about communication</strong>
<li><strong>Text is the simplest way to communicate</strong>
<li><strong>Protocols are standards for reading and writing text</strong></li>
</ul>
<p>Beneath the details, networking is an IM conversation. Here's what I wish someone told me when learning how computers communicate.<br />
<h2>TCP: The Text Layer</h2>
<h2></h2>
<p>The Transmission Control Protocol (TCP) provides the handy illusion that we can "just" send text between two computers. TCP relies on <a href="http://en.wikipedia.org/wiki/Internet_Protocol">lower levels</a> and can send binary data, but ignore that for now:
<ul>
<li><strong>TCP lets us Instant Message between computers</strong></li>
</ul>
<p>We IM with Telnet, the 'notepad' of networking: telnet sends and receives plain text using TCP. It's a chat client peacefully free of ads and unsolicited buddy requests.</p>
<p>Let's talk to Google using <a href="http://support.microsoft.com/kb/279466">telnet</a> (or <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/">putty</a>, a better utility):</p>
<pre>telnet google.com 80
[connecting...]
Hello Mr. Google!
</pre>
</p>
<p>We connect to google.com on port 80 (the default for web requests) and send the message "Hello Mr. Google!". We press Enter a few times and await the reply: </p>
<pre>&lt;html&gt;
...
&lt;h1&gt;Bad Request&lt;/h1&gt;
Your client has issued a malformed or illegal request
...
&lt;/html&gt;</pre>
<p>Malformed? Illegal? <em>The mighty Google is not pleased</em>. It didn't understand us and sent HTML telling the same. </p>
<p>But, we had a conversation: text went in, and text came back. In other words:&nbsp;
<p><img src="http://betterexplained.com/wp-content/uploads/networking/tcp_chat.png">&nbsp;<br />
<h2>Protocols: The Forms To Fill Out </h2>
<p>Unstructured chats is too carefree -- how does the server know what we want to do? We need a <em>protocol</em> (standard way of communicating) if we're going to make sense. </p>
<p>We use protocols all the time </p>
<ul>
<li>Putting to: and from: addresses in certain places on an envelope
<li>Filling out bank forms (special place for account number, deposit amount, etc.)
<li>Saying "Roger" or "10-4" to indicate a radio request was understood</li>
</ul>
<p>Protocols make communication clear. </p>
<h2>Case Study: The HTTP Protocol</h2>
<p>We see HTTP in every url: <a href="http://google.com/">http://google.com/</a>. What does it mean? </p>
<ul>
<li>Connect to server google.com (Using TCP, port 80 by default)
<li>Ask for the resource "/" (the default resource)
<li>Format the request using the Hypertext Transport Protocol</li>
</ul>
<p>HTTP is the "form to fill out" when asking for the resource. Using the HTTP format, the above request looks like this: </p>
<pre>GET / HTTP/1.0</pre>
<p>Remember, <em>it's just text</em>! We're asking for a file, through an IM session, using the format: [Command] [Resource] [Protocol Name/Version]. </p>
<p>This command is "IM'd" to the server (your browser adds extra info, a detail for another time). Google's server returns this response: </p>
<pre>HTTP/1.0 200 OK
Cache-Control: private, max-age=0
Date: Sun, 15 Mar 2009 03:13:39 GMT
Expires: -1
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=5cc6...
Server: gws
Connection: Close

&lt;html&gt;
(Google web page, search box, and cute logo)
&lt;/html&gt;
</pre>
<p>Yowza. The bottom part is HTML for the browser to display. But why the junk up top? </p>
<p>Well, suppose we just got the raw HTML to display. But what about errors: if the server crashed, the file wasn't there, or google just didn't like us? </p>
<p>Some <em>metadata</em> (data about data) is useful. When we order a book from Amazon <strong>we expect a packing slip</strong> describing the order: the intended recipient, price, return information, etc. You don't want a naked book just thrown on your doorstep. </p>
<p>Protocols are similar: the recipient wants to know if everything was OK. Here we see infamous status codes like 404 (resource not found) or 200 (everything OK). These headers aren't the real data -- they're the packing slip from the server. </p>
<h2>Insights From Protocols</h2>
<p>Studying existing, popular systems is a great way to understand engineering decisions. Here are a few: </p>
<p><strong>Binary vs Plain Text</p>
<p></strong><a href="http://betterexplained.com/articles/a-little-diddy-about-binary-file-formats/">Binary data</a> is more efficient than text, but more difficult to debug and generate (how many hex editors do you know to use?). Lower-level protocols, the backbone of the internet, use binary data to maintain performance. Application-level protocols (HTTP and above) use text data for ease of interoperability. You don't have religious wars about endian issues with HTTP. </p>
<p><strong>Stateful vs. Stateless </strong></p>
<p>Some protocols are stateful, which means the server remembers the chat with the client. With SMTP, for example, the client opens a connection and issues commands one at a time (such as adding recipients to an email), and closes the connection. Stateful communication is useful in transactions that have many steps or conditions.</p>
<p>Stateless communication is simpler: you send the entire transaction as one request. Each "instant message" stands on its own and doesn't need the others. HTTP is stateless: you can request a webpage without introducing yourself to the server.</p>
<p><strong>Extensibility</strong></p>
<p>We can't think of everything beforehand. How do we extend old protocols for new users?</p>
<p>HTTP has a simple and effective "header" structure: a metadata preamble that looks like "Header:Value".</p>
<p>If you don't recognize the header sent (new client, old server) just ignore it. If you were expecting a header but don't see it (old client, new server), just use a default. It's like having an "Anything else to tell us?" section in a survey.</p>
<p><strong>Error Correction &amp; Reliability</strong></p>
<p>It's the job of lower-level protocols like TCP to make sure data is transmitted reliably. But higher-level protocols (like HTTP) need to make sure it's the <em>right</em> data. How are errors handled and communicated? Can the client just retry or does the server need to reset state?</p>
<p>HTTP comes with its own set of error codes to handle a variety of situations.</p>
<p><strong>Availability</strong></p>
<p>The neat thing about networking is that works on one computer. Memcached is a great service to cache data. And guess what? It uses plain-old text commands (over TCP) to save and retrieve data.</p>
<p>You don't need complex COM objects or DLLs - you start a Memcached server, send text in, and get text out. It's language-neutral and easy to access because any decent OS supports networking. You can even telnet into Memcached to debug it. </p>
<p>Wireless routers are similar: they have a control panel available through HTTP. There's no "router configuration program" -- you just connect to it with your browser. The router serves up webpages, and when you submit data it makes the necessary configuration changes. </p>
<p>Protocols like HTTP are so popular you can <em>assume</em> the user has a client.</p>
<p><strong>Layering Protocols</strong> </p>
<p>Protocols can be layered. We might write a resume, which is part of a larger application, which is stuffed into an envelope. Each segment has its own format, blissfully unaware of the others. Your envelope doesn't care about the resume -- it just wants the to: and from: addresses written correctly.</p>
<p>Many protocols rely on HTTP because it's so widely used (rather than starting from scratch, like Memcached, which needs efficiency). HTTP has well-understood methods to define resources (URLs) and commands (GET and POST), so why not use them?</p>
<p>Web services do just that. The SOAP protocol crams XML inside of HTTP commands. The REST protocol embraces HTTP and uses the existing verbs as much as possible.</p>
<h2>Remember: It's All Made Up </h2>
<p>Networking involves <em>human conventions</em>. Because plain text is ubiquitous and easy to use, it is the basis for most protocols. And TCP is the simplest, most-supported way to exchange text.</p>
<p><strong>Remembering that everything is a plain text IM conversation</strong> helps me wrap my head around the inevitable networking issues. And sometimes you need to jump into HTTP to understand <a href="http://betterexplained.com/articles/how-to-optimize-your-site-with-gzip-compression/">compression</a> and <a href="http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/">caching</a>.</p>
<p>Don't just memorize the details; see protocols as strategies to solve communication problems. Happy networking.</p>
]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/a-simple-introduction-to-computer-networking/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Intro to Distributed Version Control (Illustrated)</title>
		<link>http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/</link>
		<comments>http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/#comments</comments>
		<pubDate>Mon, 15 Oct 2007 07:00:39 +0000</pubDate>
		<dc:creator>Kalid</dc:creator>
				<category><![CDATA[Guides]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://betterexplained.com/articles/a-visual-look-at-distributed-version-control/</guid>
		<description><![CDATA[

Traditional version control helps you backup, track and synchronize files. Distributed version control makes it easy to share changes. Done right, you can get the best of both worlds: simple merging and centralized releases.

Distributed? What&#8217;s wrong with regular version control?

Nothing &#8212; read a visual guide to version control if you want a quick refresher. Sure, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_logo.png" alt="" /></p>

<p>Traditional version control helps you backup, track and synchronize files. Distributed version control makes it easy to share changes. Done right, you can get the best of both worlds: simple merging and centralized releases.</p>

<h2>Distributed? What&#8217;s wrong with regular version control?</h2>

<p>Nothing &#8212; read <a href="http://betterexplained.com/articles/a-visual-guide-to-version-control/">a visual guide to version control</a> if you want a quick refresher. Sure, <em>some people</em> will deride you for using an &#8220;ancient&#8221; system. But you&#8217;re still OK in my book: using <em>any</em> <span class="caps">VCS </span>is a positive step forward for a project.</p>

<p>Centralized <span class="caps">VCS </span>emerged from the 1970s, when programmers had thin clients and admired &#8220;big iron&#8221; mainframes (how can you <strong>not</strong> like a machine with a then-gluttonous <a href="http://en.wikipedia.org/wiki/System/360">8 bits to a byte</a>?).</p>

<p><strong>Centralized is simple</strong>, and what you&#8217;d first invent: a single place everyone can check in and check out. It&#8217;s like a library where you get to scribble in the books.</p>

<p>This model works for <strong>backup, undo and synchronization</strong> but isn&#8217;t great for <strong>merging and branching</strong> changes people make. As projects grow, you want to split features into chunks, developing and testing in isolation and slowly merging changes into the main line. In reality, branching is cumbersome, so new features may come as a giant checkin, making changes difficult to manage and untangle if they go awry.</p>

<p>Sure, merging is always &#8220;possible&#8221; in a centralized system, but it&#8217;s not easy: you often need to track the merge yourself to avoid making the same change twice. Distributed systems make branching and merging painless because they rely on it.</p>

<div class="textad sponsored_footer" style="display:none;"><span class="meta">Advertisement:</span> Use <a href="http://springloops.com" rel="nofollow" onClick="javascript:urchinTracker('/ads/springloops.com');">Springloops</a> for your secure and rapid web deployment. Easily share and deploy your code. Free &#038; paid plans and no contracts: <a href="http://springloops.com/signup" rel="nofollow" onClick="javascript:urchinTracker('/ads/springloops.com');">Try it today</a>!
</div>

<h2>A Few Diagrams, Please</h2>

<p>Other tutorials have plenty of nitty-gritty text commands; here&#8217;s a <strong>visual</strong> look. To refresh, developers use a central repo in a typical <span class="caps">VCS</span>:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/centralized_example.png" alt="" /></p>

<p>Everyone syncs and checks into the main trunk: Sue adds soup, Joe adds juice, and Eve adds eggs. </p>

<p>Sue&#8217;s change must go into main before it can be seen by others. Yes, theoretically Sue <em>could</em> make a new branch for other people to try out her changes, but this is a pain in a regular <span class="caps">VCS.</span></p>

<h2>Distributed Version Control Systems (DVCS)</h2>

<p>In a <strong>distributed</strong> model, every developer has their own repo. Sue&#8217;s changes live in <strong>her local repo</strong>, which she can share with Joe or Eve:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_example.png" alt="" /></p>

<p>But will it be a circus with no ringleader? Nope. If desired, everyone can push changes into a common repo, suspiciously like the centralized model above. This franken-repo contains the changes of Sue, Joe and Eve.</p>

<p><strong>I wish distributed version control had a different name</strong>, such as &#8220;independent&#8221;, &#8220;federated&#8221; or &#8220;peer-to-peer.&#8221; The term &#8220;distributed&#8221; evokes thoughts of distributed computing, where work is split among a grid of machines (like searching for signals with <a href="http://setiathome.berkeley.edu/"><span class="caps">SETI</span>@home</a> or doing <a href="http://folding.stanford.edu/">protein folding</a>).</p>

<p>A <span class="caps">DVCS </span>is not like Seti@home: each node is completely independent and sharing is optional (in Seti you must phone back your results).</p>

<h2>Key Concepts In 5 Minutes</h2>

<p>Here&#8217;s the basics; there&#8217;s a <a href="http://en.wikibooks.org/wiki/Understanding_darcs/Patch_theory">book</a> on patch theory if you&#8217;re interested.</p>

<p><strong>Core Concepts</strong></p>


<ul>
<li>Centralized version control focuses on <strong>synchronizing, tracking, and backing up files.</strong></li>
<li>Distributed version control focuses on <strong>sharing changes</strong>; every change has a <a href="http://betterexplained.com/articles/the-quick-guide-to-guids/">guid or unique id</a>.</li>
<li><strong>Recording/Downloading</strong> and <strong>applying</strong> a change are separate steps (in a centralized system, they happen together).</li>
<li><strong>Distributed systems have no forced structure</strong>. You can create &#8220;centrally administered&#8221; locations or keep everyone as peers.</li>
</ul>



<p><strong>New Terminology</strong></p>


<ul>
<li><strong>push</strong>: send a change to another repository (may require permission)</li>
<li><strong>pull</strong>: grab a change from a repository</li>
</ul>



<p><strong>Key Advantages</strong></p>


<ul>
<li><strong>Everyone has a local sandbox.</strong> You can make changes and roll back, all on your local machine. No more giant checkins; your incremental history is in your repo. </li>
<li><strong>It works offline.</strong> You only need to be online to share changes. Otherwise, you can happily stay on your local machine, checking in and undoing, no matter if the &#8220;server&#8221; is down or you&#8217;re on an airplane.</li>
<li><strong>It&#8217;s fast.</strong> Diffs, commits and reverts are all done locally. There&#8217;s no shaky network or server to ask for old revisions from a year ago.</li>
<li><strong>It handles changes well.</strong> Distributed version control systems were <em>built</em> around sharing changes. Every change has a guid which makes it easy to track.</li>
<li><strong>Branching and merging is easy.</strong> Because every developer &#8220;has their own branch&#8221;, every shared change is like reverse integration. But the guids make it easy to automatically combine changes and avoid duplicates.</li>
<li><strong>Less management.</strong> Distributed <span class="caps">VCS</span>es are easy to get running; there&#8217;s no &#8220;always-running&#8221; server software to install. Also, <span class="caps">DVCS</span>es may not require you to &#8220;add&#8221; new users; you just pick what <span class="caps">URL</span>s to pull from. This can avoid political headaches in large projects.</li>
</ul>



<p><strong>Key Disadvantages</strong></p>


<ul>
<li><strong>You still need a backup.</strong> Some claim your &#8220;backup&#8221; is the other machines that have your changes. I don&#8217;t buy it &#8212; what if they didn&#8217;t accept them all? What if they&#8217;re offline and you have new changes? With a <span class="caps">DVCS, </span>you still want a machine to push changes to &#8220;just in case&#8221;. (In Subversion, you usually dedicate a machine to store the main repo; do the same for a <span class="caps">DVCS</span>).</li>
<li><strong>There&#8217;s not really a &#8220;latest version&#8221;</strong>. If there&#8217;s no central location, you don&#8217;t immediately know whether to see Sue, Joe or Eve for the latest version. Again, a central location helps clarify what the latest &#8220;stable&#8221; release is.</li>
<li><strong>There aren&#8217;t really revision numbers.</strong> Every repo has its own revision numbers depending on the changes. Instead, people refer to change numbers: <em>Pardon me, do you have change fa33e7b?</em> (Remember, the id is an ugly guid). Thankfully, you can tag releases with meaningful names.</li>
</ul>



<h2>Mercurial Quickstart</h2>

<p>Mercurial is a fast, simple <span class="caps">DVCS.</span> The nickname is hg, like the element Mercury.</p>



<pre>
<code>
cd project
hg init                                (create repo here)
hg add list.txt                        (start tracking file)
hg commit -m "Added file"              (check file into local repo)
hg log                                 (see history; notice guid)

changeset:   0:55bbcb7a4c24
user:        Kalid@kazad-laptop
date:        Sun Oct 14 21:36:18 2007 -0400
summary:     Added file

[edit file]
hg revert list.txt                 (revert to previous version)

hg tag v1.0                        (tag this version)
[edit file]
hg update -C v1.0                  ("update" to the older tagged version; -C forces overwrite of local copy)
</code>
</pre>



<p>Once Mercurial has initialized a directory, it looks like this:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_repo_layout.png" alt="" /></p>

<p>You have:</p>


<ul>
<li><strong>A working copy</strong>. The files you are currently editing.</li>
<li><strong>A repository</strong>. A directory (.hg in Mercurial) containing all patches and metadata (comments, guids, dates, etc.). There&#8217;s no central server so the data stays with you.</li>
</ul>



<p>In our distributed example, Sue, Joe and Eve have their own repos, with independent revision histories.</p>

<h2>Understanding Updates and Merging</h2>

<p>There&#8217;s a few items that confused me when learning about <span class="caps">DVCS.</span> First, updates happen in several steps:</p>


<ul>
<li><strong>Getting</strong> the change into a repo (pushing or pulling)</li>
<li><strong>Applying</strong> the change to the files (update or merge)</li>
<li><strong>Saving</strong> the new version (commit)</li>
</ul>



<p>Second, depending on the change, you can update or merge:</p>


<ul>
<li><strong>Updates</strong> happen when there&#8217;s no ambiguity. For example, I pull changes to a file that only you&#8217;ve been editing. The file just jumps to the latest revision, since there&#8217;s no overlapping changes.</li>
<li><strong>Merges</strong> are needed when we have conflicting changes. If we both edit a file, we end up with two &#8220;branches&#8221; (i.e. alternate universes). One world has my changes, the other world has yours. In this case we (probably) want to merge the changes together into a single universe.</li>
</ul>



<p>I&#8217;m still wrapping my head around how easily branches spring up and collapse in a <span class="caps">DVCS</span>:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_merge.png" alt="" /></p>

<p>In this case, a merge is needed because (+Soup) and (+Juice) are changes to a common parent: the list with just &#8220;Milk&#8221;. After Joe merges the files, Sue can do a regular &#8220;pull and update&#8221; to get the combined file from Joe. She doesn&#8217;t have to merge again on her own.</p>

<p>In Mercurial you can run:</p>



<pre>
<code>
hg incoming ../another-dir  (see pending changes)
hg pull ../another-dir      (download changes)

hg update                   (actually apply changes...)
hg merge                    (... or merge if needed)

hg commit                   (check in merged file; unite branches)
</code>
</pre>



<p>Yep, the &#8220;pull-merge-commit&#8221; cycle is long. Luckily, Mercurial has shortcuts to combine commands into a single one. Though it seems complex, it&#8217;s <strong>much</strong> easier than handling merges manually in Subversion.</p>

<p><strong>Most merges are automatic.</strong> When conflicts come up, they are typically resolved quickly. Mercurial keeps track of the parent/child relationship for every change (our merged list has two parents), as well as the &#8220;heads&#8221; or latest changes in each branch. Before the merge we have two heads; afterwards, one.</p>

<h2>Organizing a Distributed Project</h2>

<p>Here&#8217;s one way to organize a distributed project:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/distributed/distributed_push_pull.png" alt="" /></p>

<p>Sue, Joe and Eve check changes into a common branch. They can trade patches with each other to do simple <strong>&#8220;buddy builds&#8221;</strong>: <em>Hey buddy, can you try out these patches? I need to see if it works before I push to the experimental branch.</em></p>

<p>Later, a maintainer can review and pull changes from the experimental branch into a stable branch, which has the latest release. A distributed <span class="caps">VCS </span>helps isolate changes but still provide the &#8220;single source&#8221; of a centralized system. There are many models of development, from &#8220;pull only&#8221; (where maintainers decide what to take, and is used when developing Linux) to &#8220;shared push&#8221; (which acts like a centralized system). A distributed <span class="caps">VCS </span>gives you <strong>flexibility</strong> in how a project is maintained.</p>

<h2>Practice And Scathing Ridicule Makes Perfect</h2>

<p>I&#8217;m a <span class="caps">DVCS </span>newbie, but am happy with what I&#8217;ve learned so far. I enjoy <span class="caps">SVN, </span>but it&#8217;s &#8220;fun&#8221; seeing how easy a merge can be. My suggestion is to start with Subversion, get a grasp for team collaboration, then experiment with a distributed model. With the proper layout a <span class="caps">DVCS </span>can do anything a centralized system can, with the added benefit of easy merging.</p>

<p><strong>Online Resources</strong></p>


<ul>
<li><a href="http://www.selenic.com/mercurial/wiki/">Mercurial</a> has an <a href="http://hgbook.red-bean.com/hgbook.html">excellent book</a>. On Windows you may need <a href="http://kdiff3.sourceforge.net/">diffing/merging software</a> or <a href="http://tortoisesvn.tigris.org/TortoiseMerge.html">TortoiseMerge</a> (if you have TortoiseSVN installed).</li>
<li><a href="http://darcs.net/">Darcs</a> has a detailed <a href="http://en.wikibooks.org/wiki/Understanding_darcs">wikibook</a> (has some math theory about changes).</li>
<li><a href="http://git.or.cz">Git</a> was created by Linus Torvalds. Here&#8217;s an <a href="http://www.youtube.com/watch?v=4XpnKHJAok8">interesting lecture</a> on <span class="caps">DVCS</span>; prepare to be berated for using a centralized system:</li>
</ul>



<p><object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/4XpnKHJAok8"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/4XpnKHJAok8" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object></p>

<p>Notable Quotes:</p>


<ul>
<li>&#8220;How many have done a branch and merged it? How many of you enjoyed it?&#8221;</li>
<li>&#8220;When you do a merge, you plan ahead for a week, then set aside a day to do it.&#8221;</li>
<li>&#8220;Some people have 5, 10, 15 branches&#8221;. One branch is experimental. One branch is maintenance, etc.</li>
<li>&#8220;CVS &#8212; you don&#8217;t commit. You make changes without committing. You never commit until it passes a giant test suite. People make 1-liner changes, knowing it can&#8217;t <em>possibly</em> break.&#8221;</li>
</ul>



<p>So good luck, and watch out for the holy wars. Feel free to share any tips or suggestions below.</p>]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/feed/</wfw:commentRss>
		<slash:comments>100</slash:comments>
		</item>
		<item>
		<title>A Visual Guide to Version Control</title>
		<link>http://betterexplained.com/articles/a-visual-guide-to-version-control/</link>
		<comments>http://betterexplained.com/articles/a-visual-guide-to-version-control/#comments</comments>
		<pubDate>Thu, 27 Sep 2007 21:59:53 +0000</pubDate>
		<dc:creator>Kalid</dc:creator>
				<category><![CDATA[Guides]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://betterexplained.com/articles/a-visual-guide-to-version-control/</guid>
		<description><![CDATA[

Version Control (aka Revision Control aka Source Control) lets you track your files over time. Why do you care? So when you mess up you can easily get back to a previous working version.

You&#8217;ve probably cooked up your own version control system without realizing it had such a geeky name. Got any files like this? [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://betterexplained.com/wp-content/uploads/version_control/version_control_intro_small.png" align="center" border="0" /></p>

<p>Version Control (aka Revision Control aka Source Control) lets you track your files over time. Why do you care? So when you mess up you can easily get back to a previous working version.</p>

<p><strong>You&#8217;ve probably cooked up your own</strong> version control system without realizing it had such a geeky name. Got any files like this? (Not these exact ones I hope).</p>


<ul>
<li>KalidAzadResumeOct2006.doc</li>
<li>KalidAzadResumeMar2007.doc</li>
<li>instacalc-logo3.png</li>
<li>instacalc-logo4.png</li>
<li>logo-old.png</li>
</ul>



<p><strong>It&#8217;s why we use &#8220;Save As&#8221;.</strong> You want the new file without obliterating the old one. It&#8217;s a common problem, and solutions are usually like this:</p>


<ul>
<li>Make a <strong>single backup copy</strong> (Document.old.txt). </li>
<li>If we&#8217;re clever, we add a <strong>version number or date</strong>: Document_V1.txt, DocumentMarch2007.txt</li>
<li>We may even use a <strong>shared folder</strong> so other people can see and edit files without sending them over email. Hopefully they relabel the file after they save it.</li>
</ul>



<div class="textad sponsored_footer" style="display:none;"><span class="meta">Advertisement:</span> Use <a href="http://springloops.com" rel="nofollow" onClick="javascript:urchinTracker('/ads/springloops.com');">Springloops</a> for your secure and rapid web deployment. Easily share and deploy your code. Free &#038; paid plans and no contracts: <a href="http://springloops.com/signup" rel="nofollow" onClick="javascript:urchinTracker('/ads/springloops.com');">Try it today</a>!
</div>

<h2>So Why Do We Need A Version Control System (VCS)?</h2>

<p>Our shared folder/naming system is fine for class projects or one-time papers. But software projects? Not a chance.</p>

<p>Do you think the Windows source code sits in a shared folder like &#8220;Windows2007-Latest-UPDATED!!&#8221;, for anyone to edit? That every programmer just works in a different subfolder? No way.</p>

<p>Large, fast-changing projects with many authors need a Version Control System (geekspeak for &#8220;file database&#8221;) to track changes and avoid general chaos. A good <span class="caps">VCS </span>does the following:</p>


<ul>
<li><strong>Backup and Restore.</strong> Files are saved as they are edited, and you can jump to any moment in time. Need that file as it was on Feb 23, 2007? No problem.</li>
<li><strong>Synchronization.</strong> Lets people share files and stay up-to-date with the latest version.</li>
<li><strong>Short-term undo.</strong> Monkeying with a file and messed it up? (That&#8217;s just like you, isn&#8217;t it?). Throw away your changes and go back to the &#8220;last known good&#8221; version in the database.</li>
<li><strong>Long-term undo.</strong> Sometimes we mess up bad. Suppose you made a change a year ago, and it had a bug. Jump back to the old version, and see what change was made that day.</li>
<li><strong>Track Changes</strong>. As files are updated, you can leave messages explaining why the change happened (stored in the <span class="caps">VCS, </span>not the file). This makes it easy to see how a file is evolving over time, and why.</li>
<li><strong>Track Ownership.</strong> A <span class="caps">VCS </span>tags every change with the name of the person who made it. Helpful for <a href="http://www.unwords.com/unword/blamestorming.html"><del>blamestorming</del></a> giving credit.</li>
<li><strong>Sandboxing</strong>, or insurance against yourself. Making a big change? You can make temporary changes in an isolated area, test and work out the kinks before &#8220;checking in&#8221; your changes.</li>
<li><strong>Branching and merging</strong>. A larger sandbox. You can <strong>branch</strong> a copy of your code into a separate area and modify it in isolation (tracking changes separately). Later, you can <strong>merge</strong> your work back into the common area.</li>
</ul>



<p>Shared folders are quick and simple, but can&#8217;t beat these features.</p>

<h2>Learn the Lingo</h2>

<p>Most version control systems involve the following concepts, though the labels may be different.</p>

<p>Basic Setup</p>


<ul>
<li><strong>Repository (repo)</strong>: The database storing the files. </li>
<li><strong>Server</strong>: The computer storing the repo.</li>
<li><strong>Client</strong>: The computer connecting to the repo.</li>
<li><strong>Working Set/Working Copy</strong>: Your local directory of files, where you make changes.</li>
<li><strong>Trunk/Main</strong>: The &#8220;primary&#8221; location for code in the repo. Think of code as a family tree &#8212; the &#8220;trunk&#8221; is the main line.</li>
</ul>



<p>Basic Actions</p>


<ul>
<li><strong>Add</strong>: Put a file into the repo for the first time, i.e. begin tracking it with Version Control.</li>
<li><strong>Revision</strong>: What version a file is on (v1, v2, v3, etc.).</li>
<li><strong>Head</strong>: The latest revision in the repo.</li>
<li><strong>Check out</strong>: Download a file from the repo.</li>
<li><strong>Check in</strong>: Upload a file to the repository (if it has changed). The file gets a new revision number, and people can &#8220;check out&#8221; the latest one.</li>
<li><strong>Checkin Message</strong>: A short message describing what was changed.</li>
<li><strong>Changelog/History</strong>: A list of changes made to a file since it was created.</li>
<li><strong>Update/Sync</strong>: Synchronize your files with the latest from the repository. This lets you grab the latest revisions of all files.</li>
<li><strong>Revert</strong>: Throw away your local changes and reload the latest version from the repository.</li>
</ul>



<p>Advanced Actions</p>


<ul>
<li><strong>Branch</strong>: Create a separate copy of a file/folder for private use (bug fixing, testing, etc). Branch is both a verb (&#8221;branch the code&#8221;) and a noun (&#8221;Which branch is it in?&#8221;).</li>
<li><strong>Diff/Change/Delta</strong>: Finding the differences between two files. Useful for seeing what changed between revisions.</li>
<li><strong>Merge (or patch)</strong>: Apply the changes from one file to another, to bring it up-to-date. For example, you can merge features from one branch into another. (At Microsoft this was called <a href="http://blogs.msdn.com/larryosterman/archive/2005/02/01/364840.aspx">Reverse Integrate and Forward Integrate</a>)</li>
<li><strong>Conflict</strong>: When pending changes to a file contradict each other (both changes cannot be applied).</li>
<li><strong>Resolve</strong>: Fixing the changes that contradict each other and checking in the correct version.</li>
<li><strong>Locking</strong>: &#8220;Taking control&#8221; of a file so nobody else can edit it until you unlock it. Some version control systems use this to avoid conflicts.</li>
<li><strong>Breaking the lock</strong>: Forcibly unlocking a file so you can edit it. It may be needed if someone locks a file and goes on vacation (or &#8220;calls in sick&#8221; the day Halo 3 comes out).</li>
<li><strong>Check out for edit</strong>: Checking out an &#8220;editable&#8221; version of a file. Some <span class="caps">VCS</span>es have editable files by default, others require an explicit command.</li>
</ul>



<p>And a typical scenario goes like this:</p>

<p>Alice <strong>adds</strong> a file (<code>list.txt</code>) to the <strong>repository</strong>. She <strong>checks it out</strong>, makes a change (puts &#8220;milk&#8221; on the list), and checks it back in with a checkin message (&#8221;Added required item.&#8221;). The next morning, Bob <strong>updates</strong> his local working set and sees the latest revision of <code>list.txt</code>, which contains &#8220;milk&#8221;. He can browse the <strong>changelog</strong> or <strong>diff</strong> to see that Alice put &#8220;milk&#8221; the day before.</p>

<h2>Visual Examples</h2>

<p>This guide is purposefully high-level: most tutorials throw a bunch of text commands at you. Let&#8217;s cover the high-level concepts without getting stuck in the syntax (the <a href="http://svnbook.red-bean.com/">Subversion manual</a> is always there, don&#8217;t worry). Sometimes it&#8217;s nice to <strong>see what&#8217;s possible</strong>.</p>

<h2>Checkins</h2>

<p>The simplest scenario is checking in a file (<code>list.txt</code>) and modifying it over time.</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/basic_checkin.png" alt="version control checkin" /></p>

<p>Each time we check in a new version, we get a new revision (r1, r2, r3, etc.). In Subversion you&#8217;d do:</p>



<pre>
<code>svn add list.txt
(modify the file)
svn ci list.txt -m "Changed the list"</code>
</pre>



<p>The <code>-m</code> flag is the message to use for this checkin.</p>

<h2>Checkouts and Editing</h2>

<p>In reality, you might not keep checking in a file. You may have to <strong>check out, edit and check in</strong>. The cycle looks like this:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/checkout_edit.png" alt="version control checkout" /></p>

<p>If you don&#8217;t like your changes and want to start over, you can <strong>revert</strong> to the previous version and start again (or stop). When checking out, you get the latest revision by default. If you want, you can specify a particular revision. In Subversion, run:</p>



<pre>
<code>
svn co list.txt (get latest version)
...edit file...
svn revert list.txt (throw away changes)

svn co -r2 list.txt (check out particular version)
</code>
</pre>



<h2>Diffs</h2>

<p>The trunk has a history of <strong>changes</strong> as a file evolves. Diffs are the changes you made while editing: imagine you can &#8220;peel&#8221; them off and apply them to a file:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/basic_diffs.png" alt="version control diff" /></p>

<p>For example, to go from r1 to r2, we add eggs (+Eggs). Imagine peeling off that red sticker and placing it on r1, to get r2.</p>

<p>And to get from r2 to r3, we add Juice (+Juice). To get from r3 to r4, we remove Juice and add Soup (-Juice, +Soup).</p>

<p>Most version control systems <strong>store diffs rather than full copies of the file</strong>. This saves disk space: 4 revisions of a file doesn&#8217;t mean we have 4 copies; we have 1 copy and 4 small diffs. Pretty nifty, eh? In <span class="caps">SVN, </span>we diff two revisions of a file like this:</p>



<pre>
<code>svn diff -r3:4 list.txt</code>
</pre>



<p>Diffs help us notice changes (&#8221;How did you fix that bug again?&#8221;) and even apply them from one branch to another.</p>

<p><strong>Bonus question:</strong> what&#8217;s the diff from r1 to r4?</p>



<pre>
<code>+Eggs
+Soup</code>
</pre>



<p>Notice how &#8220;Juice&#8221; wasn&#8217;t even involved &#8212; the direct jump from r1 to r4 doesn&#8217;t need that change, since Juice was overridden by Soup.</p>

<h2>Branching</h2>

<p>Branches let us copy code into a separate folder so we can monkey with it separately:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/first_branch.png" alt="version control branch" /></p>

<p>For example, we can create a branch for new, experimental ideas for our list: crazy things like Rice or Eggo waffles. Depending on the version control system, creating a branch (copy) may change the revision number.</p>

<p>Now that we have a branch, we can change our code and work out the kinks. (<i>&#8220;Hrm&#8230; waffles? I don&#8217;t know what the boss will think. Rice is a safe bet.&#8221;</i>). Since we&#8217;re in a separate branch, we can make changes and test in isolation, knowing our changes won&#8217;t hurt anyone. And our branch history is under version control.</p>

<p>In Subversion, you create a branch simply by copying a directory to another.</p>



<pre>
<code>svn copy http://path/to/trunk http://path/to/branch</code>
</pre>



<p>So branching isn&#8217;t too tough of a concept: <strong>Pretend you copied your code into a different directory.</strong> You&#8217;ve probably branched your code in school projects, making sure you have a &#8220;fail safe&#8221; version you can return to if things blow up.</p>

<h2>Merging</h2>

<p>Branching sounds simple, right? Well, it&#8217;s not &#8212; figuring out how to merge changes from one branch to another can be tricky.</p>

<p>Let&#8217;s say we want to get the &#8220;Rice&#8221; feature from our experimental branch into the mainline. How would we do this? Diff r6 and r7 and apply that to the main line?</p>

<p><strong>Wrongo.</strong> We only want to apply the changes <strong>that happened in the branch!</strong>. That means we diff r5 and r6, and apply that to the main trunk:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/merging.png" alt="version control merge" /></p>

<p>If we diffed r6 and r7, we would lose the &#8220;Bread&#8221; feature that was in main. This is a subtle point &#8212; imagine &#8220;peeling off&#8221; the changes from the experimental branch (+Rice) and adding that to main. Main may have had other changes, which is ok &#8212; we just want to insert the Rice feature.</p>

<p>In Subversion, merging is very close to diffing. Inside the main trunk, run the command: </p>



<pre>
<code>svn merge -r5:6 http://path/to/branch</code>
</pre>



<p>This command diffs r5-r6 in the experimental branch and applies it to the current location. Unfortunately, Subversion doesn&#8217;t have an easy way to keep track of what merges have been applied, so if you&#8217;re not careful you may apply the same changes twice. It&#8217;s a planned feature, but the current advice is to keep a changelog message reminding you that you&#8217;ve already merged r5-r6 into main.</p>

<h2>Conflicts</h2>

<p>Many times, the <span class="caps">VCS </span>can automatically merge changes to different parts of a file. <strong>Conflicts</strong> can arise when changes appear that don&#8217;t gel: Joe wants to remove eggs and replace it with cheese (-eggs, +cheese), and Sue wants to replace eggs with a hot dog (-eggs, +hot dog).</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/vcs_conflict.png" alt="version control conflict" /></p>

<p>At this point it&#8217;s a race: if Joe checks in first, that&#8217;s the change that goes through (and Sue can&#8217;t make her change). </p>

<p>When changes overlap and contradict like this, the <span class="caps">VCS </span>may report a <strong>conflict</strong> and not let you check in &#8212; it&#8217;s up to you to check in a newer version that <strong>resolves</strong> this dilemma. A few approaches:</p>


<ul>
<li><strong>Re-apply your changes</strong>. Sync to the the latest version (r4) and re-apply your changes to this file: Add hot dog to the list that already has cheese.</li>
<li><strong>Override their changes with yours</strong>. Check out the latest version (r4), copy over your version, and check your version in. In effect, this removes cheese and replaces it with hot dog.</li>
</ul>



<p>Conflicts are infrequent but can be a pain. Usually I update to the latest and re-apply my changes.</p>

<h2>Tagging</h2>

<p>Who would have thought a version control system would be Web 2.0 compliant? Many systems let you tag (label) any revision for easy reference. This way you can refer to &#8220;Release 1.0&#8243; instead of a particular build number:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/tagging.png" alt="version control tag" /></p>

<p>In Subversion, tags are just branches that you agree not to edit; they are around for posterity, so you can see exactly what your version 1.0 release contained. Hence they end in a stub &#8212; there&#8217;s nowhere to go.</p>



<pre>
<code>(in trunk)
svn copy http://path/to/revision http://path/to/tag
</code>
</pre>



<h2>Real-life example: Managing Windows Source Code</h2>

<p>We guessed that Windows was managed out of a shared folder, but it&#8217;s not the case. So <a href="http://blogs.msdn.com/larryosterman/archive/2005/02/01/364840.aspx">how&#8217;s it done</a>?</p>


<ul>
<li>There&#8217;s a <strong>main line</strong> with stable builds of Windows.</li>
<li>Each group (Networking, User Interface, Media Player, etc.) <strong>has its own branch</strong> to develop new features. These are under development and less stable than main.</li>
</ul>



<p>You develop new features in your branch and &#8220;Reverse Integrate (RI)&#8221; to get them into Main. Later, you &#8220;Forward Integrate&#8221; and to get the latest changes from Main into your branch:</p>

<p><img src="http://betterexplained.com/wp-content/uploads/version_control/windows.png" alt="version control branch example" /></p>

<p>Let&#8217;s say we&#8217;re at Media Player 10 and IE 6. The Media Player team makes version 11 in their own branch. When it&#8217;s ready and tested, there&#8217;s a patch from 10 &#8211; 11 which is applied to Main (just like the &#8220;Rice&#8221; example, but a tad more complicated). This a <strong>reverse integration</strong>, from the branch to the trunk. The IE team can do the same thing.</p>

<p>Later, the Media Player team can pick up the latest code from other teams, like <span class="caps">IE.</span> In this case, Media Player <strong>forward integrates</strong> and gets the latest patches from main into their branch. This is like pulling in the &#8220;Bread&#8221; feature into the experimental branch, but again, more complicated.</p>

<p>So it&#8217;s RI and <span class="caps">FI.</span> Aye aye. This arrangement lets changes percolate throughout the branches, while keeping new code out of the main line. Cool, eh?</p>

<p>In reality, there&#8217;s many layers of branches and sub-branches, along with quality metrics that determine when you get to <span class="caps">RI.</span> But you get the idea: branches help manage complexity. Now you know the basics of how one of the largest software projects is organized.</p>

<h2>Key Takeaways</h2>

<p>My goal was to share high-level thoughts about version control systems. Here are the basics:</p>


<ul>
<li><strong>Use version control.</strong> Seriously, it&#8217;s a good thing, even if you&#8217;re not writing an <span class="caps">OS.</span> It&#8217;s worth it for backups alone.</li>
<li><strong>Take it slow.</strong> I&#8217;m only now looking into branching and merging for my projects. Just get a handle on using version control and go from there. If you&#8217;re a small project, branching/merging may not be an issue. Large projects often have experienced maintainers who keep track of the branches and patches.</li>
<li><strong>Keep Learning.</strong> There&#8217;s plenty of guides for <a href="http://svnbook.red-bean.com/"><span class="caps">SVN</span></a>, <a href="http://wwwasd.web.cern.ch/wwwasd/cvs/tutorial/cvs_tutorial_toc.html"><span class="caps">CVS</span></a>, <a href="http://agave.garden.org/~aaronh/rcs/tutorial.html"><span class="caps">RCS</span></a>, <a href="http://www.kernel.org/pub/software/scm/git/docs/tutorial.html">Git</a>, <a href="http://public.perforce.com/public/tutorial.html">Perforce</a> or whatever system you&#8217;re using. The important thing is to <strong>know the concepts</strong> and realize every system has its own lingo and philosophy. Eric Sink has a <a href="http://www.ericsink.com/scm/source_control.html">detailed version control guide</a> also.</li>
</ul>



<p>These are the basics &#8212; as time goes on I&#8217;ll share specific lessons I&#8217;ve learned from <a href="http://instacalc.com">my projects</a>. Now that you&#8217;ve figured out a regular <span class="caps">VCS, </span><a href="http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/">try an illustrated guide to distributed version control</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://betterexplained.com/articles/a-visual-guide-to-version-control/feed/</wfw:commentRss>
		<slash:comments>177</slash:comments>
		</item>
	</channel>
</rss>
