Home : Resources : Endian issues
Helps to know: C programming

Summary: Explains why big and little-endian problems occur, and how to fix them. Example C code included.
Sections:

Big-endian and little-endian, host order and network order -- it has driven me mad. Here's how I understand it.

Numbers vs. Data

A number is always a number, no matter how it is represented. When we talk about the hex number 0x12, we mean 18 in decimal, no matter the endianness. Different computers may store numbers differently. But when I save 0x12 in a variable on one machine, then ask for that value back on the same machine, it is 0x12. Always, always, always. Strange things may happen when you store a number on one machine and try to read it on another. Data is the bytes that a computer stores. A number is the concept that we humans have of the data (the number 18). The computer interprets between the two.

Data is Data

Data is stored one byte at a time. Byte 0 has a value, byte 1 has a value, etc. How these bytes are interpreted is left up to the machine.

Consider the bytes

W X Y Z

0 1 2 3

The address is on the bottom. I didn't use A B C D because those are hex digits, and I don't want to confuse anyone. W is a whole byte, i.e. 8 bits. Now that I've cleared that up, let

W = 0x12 [hex digits 12, or 00010010 in binary]
X = 0x23
Y = 0xAB
Z = 0xCD

Interpreting pointers

In C, when you cast a pointer to a certain type, it tells the computer how to interpret that value. Let's declare:

char *c;
void *p = 0; // pointer to byte location zero -- won't work in real life ;)
             // A better way is to declare variables and use pointers to them

We can't dereference p because we don't know its type. Suppose we say

c = (char *)p;

This tells the computer to interpret c as a pointer to a character. Characters have 1 byte, so c just points to W. If we print c, we get the value in W, i.e., 0x12 (remember, W is a whole byte). It doesn't matter if a machine is big- or little-endian, since we are examining one byte. If we were using a sadistic machine that reversed bits inside a byte (that happens!), then it would matter. I won't consider that.

This should tell you something -- if we cast a pointer as a char *, we can walk through memory one byte at a time. The endianiosity doesn't matter. Making up words is fun =).

What's the problem?

The problem occurs when we consider 2 or more bytes. In which order do the bytes appear?

Suppose we have, as above:

W X Y Z

0 1 2 3

stored on both a big and little endian machine. That is, memory location 0 is W on both machines, memory location 1 is X, etc. We can set this up using the realization above, i.e.

c = 0;     // point to location 0. This won't work on a real machine! NULL pointers are bad.
*c = 0x12; // Set W's value
c = 1;     // now points to location 1
*c = 0x34; // Set X's value

Note that this code will work on any endian machine, since endianness doesn't matter for single bytes! So, we've got both machines set up with the bytes W, X, Y, Z.

Here's the rub:
Big endian machine: It thinks the first byte it reads is the biggest
Little endian machine: Thinks the first byte it reads is the littlest.

The naming makes sense, eh? This also explains why a single byte doesn't matter. If you have one byte, it is the first and only one you read, and there is no other way to interpret it.

Let's do an example:

short *s; // a short is 16 bits, or 2 bytes
s = 0; // points to location zero, so *s is the value there

What is the value at s? Here's what the computer does:
Big endian machine: I know a short is two bytes. So, I'll read off two bytes (location 0 and location 1). The first byte is the biggest (hey, I'm big-endian!) so I think the value is 256 * byte 0 + byte 1, or W X, or 0x1234.

Little endian machine: I know a short is type bytes, and I'll read off two. The first byte read (location 0) is the littlest, so I think the value is 256 * byte 1 + byte 0, or XW, or 0x3412.

Note: Both types of machines start from the pointer's location and read upward. Both read location 0 first, then location 1. We multiply by 256, or 2^8, because the larger byte is shifted over 8 bits.

See the problem? A big endian machine reads the data as 0x1234 and a little endian machine reads the data as 0x3412. Same data, two different numbers.

Another example:

int *i; // int is 4 bytes on 32-bit machine
i = 0; // points to location zero, so *i is the value there

What is the value at i? Here's what the computer does:
Big endian machine: I know an int is 4 bytes, with the first being the largest. So, I think the value is W X Y Z (W is the largest byte), or 0x1234ABCD.

Little endian machine: I know an int is 4 bytes, and I'll read them off starting at the pointer. The first byte read (location 0) is the littlest, so I think the value is Z Y X W (W is the smallest byte), or 0xCDAB3412.

Again, both machines read the bytes in the same order!

Example: the NUXI problem

Byte ordering issues are sometimes called the NUXI problem. Suppose we want to store "UNIX" as two shorts. Each letter is whole byte, just like W, X, Y and A above. So, we do:

short *s;

s = 0; // location 0
*s = UN; // Fictional, but store U * 256 + N
s = 2; // need to jump 2 bytes!
*s = IX; // store I * 256 + N at location 1

This code is not specific to a machine! If we store the two bytes "UN" on a machine and later ask to read it back, it had better be "UN" ! I don't care about endianness, if we store something, we get the same thing back.

However, if we look at the bytes one at a time (using our char * trick), it may be different depending on the machine. On a big endian machine, we see

U N I X

0 1 2 3

This checks out. U is the biggest byte in "UN" , so it is stored first. Same thing for IX. On a little-endian machine, we see


N U X I

0 1 2 3

This checks out also. "N" is the littlest byte in "UN", so it is stored first. Even though it is stored this way, when we read back the short at location 0, we will get "UN" ! The computer knows it is little endian, and knows that the first byte read is the smallest one!

It's called the NUXI problem because the bytes are stored as UNIX on the big-endian machine, and NUXI on the little-endian one. It's not really a "problem" unless the big endian machine reads the data in the litte-endian one. Each machine is internally consistent.

Home : Resources
Send questions, comments, corrections, and suggestions to [email protected]. Last modified: 4/20/02 11:08 PM