The Quick Guide to GUIDs

Get the Math, Better Explained eBook and turn Huh? to Aha!

Our world is numbered. Books have ISBNs and products have barcodes. Cars have VINs, even people have social security numbers.

Numbers help us reference items unambiguously. “John Smith” may be many people, but Social Security Number 123-45-6789 refers to one person exactly.

A GUID (globally unique identifier) is a bigger, badder version of this type of ID number. You may see the term UUID tossed about (universally unique identifier), a nitpicky word for those whose numbers are unique not only within the globe, but throughout the entire universe.

Any way you title it, GUIDs or UUIDs are just big, gigantic ID numbers.

The Problem With Counting

“We don’t need no stinkin’ GUIDs,” you may be thinking between gulps of Top Ramen, “I’ll just use regular numbers and start counting up from 1.”

Sure, it sounds easy. Just start with ISBN #1 and add one for each new book. But problems arise:

  • Who does the counting? A central authority?
  • Who handles simulataneous requests and eliminates duplicates?
  • Can IDs be shared between products? Is Social Security Number 1 different from ISBN 1?
  • Can people guess what the next ID will be? How many IDs have been issued?

The problem with counting is that we want to create ID numbers without the management headache.

GUIDs to the Rescue

GUIDs are large, enormous numbers that are nearly guaranteed to be unique. They are usually 128 bits long and look like this in hexadecimal:

30dd879c-ee2f-11db-8314-0800200c9a66

The format is a well-defined sequence of 32 hex digits grouped into chunks of 8-4-4-4-12. This gives us 2^128 or about 10^38 numbers.

Here’s the thinking behind GUIDs:

  • If you pick a huge random number (39 digits long), it’s really unlikely that someone will pick the same one.

  • GUIDs are not tied to a product. A GUID can be used for people, cars, files, webpages, colors, anything. With regular registration numbers, you start counting at 1 and numbers can overlap. Social Security Number 123-45-6789 is different from ISBN 123456789 which is different from barcode 123456789. This isn’t an issue with GUIDs.

  • It’s up to the person reading the GUID to figure out the context of the GUID. There are so many GUIDs that you can use them to number everything and not run out.

GUIDs give you a unique serial number that can be used on any item in the universe.

The Great GUID Shortage

When learning about GUIDs, it feels like 39 measly digits aren’t enough. Won’t we run out if people get GUID-crazy, assigning them for everything from their pets to their favorite bubble gum flavor?

Let’s see. Think about how big the Internet is: Google has billions of web pages in its index. Let’s call it a trillion (10^12) for kicks. Think about every wikipedia article, every news item on CNN, every product in Amazon, every blog post from any author. We can assign a GUID for each of these documents.

Now let’s say everyone on Earth gets their own copy of the internet, to keep track of their stuff. Even crazier, let’s say each person gets their own copy of the internet every second. How long can we go on?

Over a billion years.

Let me say that again. Each person gets a personal copy of the internet, every second, for a billion years.

It’s a mind-boggling amount of items, and it’s hard to get our heads around it. Trust me, we won’t run out of GUIDs anytime soon. And if we do? We’ll start using GUIDs with more digits.

Using GUIDs

If you want to create GUIDs, try the

There are several ways to create GUIDs (RFC 4122 describes the conventions), but you want to avoid that mess and use a library. The general types of GUIDs are:

  • Random: Just use the system’s random-number generator to create a 128-bit number.
  • Time-based: Create a GUID based on the current time.
  • Hardware-based: Make a GUID with certain portions based on hardware features, such as the MAC address of a network card. This isn’t great because the GUID isn’t “anonymous” and can be partially traced to the machine that created it.
  • Content-based (MD5 or SHA-1 hash of data): Create a GUID based on a hash of the file contents. Files with the same contents will get the same GUID. You can also seed the hash with a unique namespace (like your URL).

You can mix-and-match techniques above. If you want duplicate files to have the same GUID, then use GUIDs based on the contents. If you want GUIDs to be unique, even if the contents are the same, then create them randomly or with a combination of file contents and a random number.

GUID Examples

Here’s a few things you can do with GUIDs:

The Tradeoffs with GUIDs

Like most things in life, GUIDs have benefits and drawbacks. Weigh the features to see if they make sense:

Pros:

  • No central authority: You avoid the need for management, but can’t keep track of what’s been assigned. A compromise is to generate GUIDs internally and then hand them out.
  • Easily combined: You can merge GUIDs from different data sources very easily with a low chance of conflict.

Cons:

  • Appear random: Users cannot easily guess the ID for an object they don’t know. This is good for security, difficult for debugging.
  • GUID overhead:GUIDs are an example of the time-space tradeoff. You save time in merging but have to use space to store the large (16-byte) GUID. It may not make sense to have a 16-byte GUID keeping track of a 4-byte item in your database.

GUIDs are not a GUARantee

There’s one giant caveat for GUIDs: collisions are still possible.

First, the birthday paradox shows us the chance of a collision as GUIDs are used. It’s very, very unlikely that GUIDs will collide, but as more are assigned, there are fewer left to choose from.

Second, a malicious user could try hijacking GUIDs that he knows will be used (assuming the user can assign their own GUIDs), or resubmitting different content to a previous GUID (submitting file A under the hash of file B).

If you are writing software, program defensively and detect cases where the GUID already exists. Give the user an error or even better, recover, create a new GUID on the server side and try again. GUIDs are great, but they aren’t a magic bullet.

As always, we’re never done learning. Read more about GUIDs here:

Kalid Azad loves sharing Aha! moments. BetterExplained is dedicated to learning with intuition, not memorization, and is honored to serve 250k readers monthly.

Enjoy this article? Try the site guide or join the newsletter:
Math, Better Explained is a highly-regarded Amazon bestseller. This 12-part book explains math essentials in a friendly, intuitive manner.

"If 6 stars were an option I'd give 6 stars." -- read more reviews

27 Comments

  1. IPv6 (which is very similar to GUIDs, since it’s also 128 bits in length) is capable of assigning approx. 1000 addresses per each square meter of surface on Earth. They also say that every grain of sand in the Sahara could have it’s own IP address, but the question is still open whether there’d be any left for the rest of us :)

    Remember IPv4, which should allow almost every person on Earth (6-7 billion and growing fast) to have an address, but with everyone having a dozen gadgets on them this number seems less than enough. Ofcourse this is kind of an overstatement, but has some truth in it. Up to this day almost half of the IPv4 addresses have been used, and with the current rate we are running out of them exponentially!

    In contrast with GUIDs, IPv6 will have a central authority managing the addresses (or at least address ranges), thus collisions are out of the question, and still we could have the same amount of both IDs. Though I wouldn’t assign an address to all my teddy bears and goldfish, it’d be a challenge for them to interpret all the protocols :)

    Perhaps you should write an article about IPv6 too, it’s at least as interesting as GUIDs, and I like your style much better than my own :)

  2. Thanks for the comment! IPv6 is an excellent example of the use of GUIDs.

    I played with the numbers a bit, I think we may have enough IPv6 addresses for a while. It seems we can squeeze over 600,000 GUIDs or IPv6 addresses per square nanometer:

    http://tinyurl.com/2dvw4q

    That ought to be enough for anyone, right? :) (Of course, there’s probably some overhead as you say by allocating addresses in fixed blocks by a central authority. I wonder if some entity like MIT is going to get a ridiculous 18.*.*.* “class A” address block like they have now: http://libstaff.mit.edu/colserv/digital/ordering/ip.html)

    I really appreciate the comment, I think IPv6 would be a great topic as it’s becoming more and more relevant.

  3. Actually if we reallocated some IPv4 address ranges and started a better management of distribution of new ones then we could make it last longer, even much longer, but that would mean kicking over a lot of engrained habits, it’s always easier to widen the address range. Besides that, IPv6 offers a lot more improvements compared to IPv4 which developers had to think about in the last 20 or so years since it was standardized :)

  4. I just read my notes on IPv6 and the lecturer said that it should last till about 2040. In 2001 they said that we will run out of IPv4 addresses by 2007, but this has been prolonged to 2010, but I suppose there might be a few more years there. So I just wonder how the hell we are going to use up all those IPv6 addresses in a mere 30 years :)

  5. Yeah, it seems like IPv6 would have been designed to avoid the inevitable IP address shortage. Unless there’s a mistake in my numbers, I’d be hard pressed to say how we’d run out :).

  6. It depends. You can envision a future in which IPv6 addresses are used to address every component in the machine, for example, as a universal address. This would significantly increase usage of existing allocations.

    I control several IPv6 subnets myself, one /64 and one /48 (although I am only using a /64 of my /48 allocation). That totals around 1.2 x 10^4 addresses. And I’m just a regular guy! (check out SixXS)

  7. Thanks Derrell, that’s a good point. I had forgot to mention there are certain “flags” you can set in the GUID for various versions. It’s best to use a library to create them.

  8. Well, at 71 and having started to reach myself how to use a sesktop my son gave me on retirement, I never knew what a GUID is until I read this article. You admirably explain it! Thank you.

  9. I find it somewhat confusing. Is it necessary to have a guid in order for updates to be sent to let’s say a person subscribing to a feed via email. If not is it based on pubdate or lastbuild.
    Now suppose the feeds are single feeds such as in a list of something and each has its own xml file. My feeds are single lists and am using perl to update the pubdate and lastbuildDate to todays date if the list has changed. Using feedburner/feedblitz to take care of distribution to users. However, feedlblitz support says the feeds may not work without the guid changing (test of email feeds appears to work without a guid) but there are so many I can’t subscribe to each and everyone. So, if I do need to create a guid, I have no clue how to do this using the generator mentioned here and how to insert into the xml file when the list has been updated. I’m not very good at perl or php but am trying. Thanks.

  10. Hi J. Daniels,

    I’m not that familiar with the details of Feedburner feeds, but from a GUID perspective they need to be unique for each item. So if you change a feed then the GUID may need to change also. Hope this helps.

  11. My children have GUID numbers for their school I’ve noticed and that is why I was trying to figure out what this exactly is. So, I take it these numbers are just used internally…just the school district itself for their use? Or is this number used nation wide? They attend a Charter School and I wanted to know what this number was and what they used it for. Can you give me some idea and if this is something they generate within the district or is it a number generated from the government?

  12. You used the URL “http://en.wikipedia.org/wiki/Uuid#Random_UUID_Probability_of_Duplicates” for a link, when the URL should be “http://en.wikipedia.org/wiki/Uuid#Random_UUID_probability_of_duplicates” because the id used in the HTML is “Random_UUID_probability_of_duplicates”, not “Random_UUID_Probability_of_Duplicates”.

  13. > Easily combined: You can merge GUIDs from different data sources very easily with a low chance of conflict.

    This bullet is under “The Problem with GUIDs”?

  14. @Karen: A GUID is basically a giant random number, so its use/sharing is up to the person who created it. But most likely it’s just used within the school, you’d have to check.

    @Calvin: Link updated.

    @Petr: Thanks, updated. Great catch, 3.4 * 10^38 has 39 digits.

    @Joe: Whoops, great point — I reorganized that section.

  15. @Kalid: Ok good I thought maybe something went way over my head when I saw that as a con.

    Thanks for the revision. And while I’m at it, thanks for making your insights available to everyone. After reading just one article, I knew this would be a site I’d share often.

  16. @Joe: You’re welcome, thanks again for the catch (reading again, I don’t know how I missed that the first time!). Glad you’re enjoying the site.

  17. In terms of GUID collision, I think the issue isn’t how many potential GUIDs could exist within the type length, but how we create new ones.
    An example, using the time+MacAddress GUID algorithm; what happens when two GUIDs are created on the same computer at the same time?
    This can occur in a multiple processor machine where two or more CPUs are running threads that create GUIDs. Potentially they could both create a GUID at the same time, or put it another way ‘within the same timeframe as defined by the resolution of the clock speed’
    Being on the same computer could mean both threads use the same MAC address. Same time, same mac address results in the potential for the same GUID in both threads.
    Still very very low likelyhood of occurance, but definitely possible.

  18. Great article, Kalid.

    To those wondering when we will run out of IPv6 addresses – don’t worry. IPv6 can easily address more locations than grains of sand in the Sahara (*millions* of times more, in fact) and probably more locations than molecules in the universe. We essentially never have to worry about them running out.

    More: http://www.bogpeople.com/networking/ipv6/ipv6.shtml

  19. Just a nit-pick- GUIDs are a specific implementation of UUIDs, which you are explaining in this article. UUIDs are an IEEE standard as mentioned in RFC 4122. GUIDs are a Microsoft implementation of UUIDs, with some changes.

    First, GUIDs are split into 4 “data fields”, of different sizes. The stored binary endianness of the first three data fields depends on the host system. For UUIDs, the binary endianness is always stored as big endian. However, displayed visually, UUIDs and GUIDs should encode to the same text.

    Second, the OSF specification on GUIDs states that for version 1 GUIDs, it must use the hosts MAC address as part of the construction. The RFC for UUIDs states that it must use a “node ID”, which can be any 48-bits that can uniquely identify the host hardware. In practice, this is typically a MAC address, but the RFC doesn’t limit version 1 UUIDs to MAC addresses only, and many embedded devices don’t even have a network card for a MAC to be used.

    Lastly, for version 4 GUIDs, Microsoft uses the WinAPI GUID generator, which uses a pseudorandom number generator to create the GUID. Unfortunately, the algorithm is broken, and if the internal state of the system is known, the GUID can be practically discovered. The UUID RFC says “Set all the other bits to randomly (or pseudo-randomly) chosen values.” In other words, it’s up to the UUID implementation on how the random bits are created.

    Microsoft has gone to great lengths to try and explain the difference between “uniqueness” and “randomness” with their broken GUID version 4 generation. Regardless, it’s just that- broken.

  20. When I read this I am kind of wondering, since GUIDs are preferred because there is no overhead compared to a counter, the question should not be are there enough GUIDs to go around, but what are the chances that a random generated GUID is allready in use, and what are the costs if this happens….

Your feedback is welcome -- leave a reply!

Your email address will not be published.

LaTeX: $$e=mc^2$$