little / big endian mystery

This article is an attempt to resolve the mystery behind the often misunderstood and ever-confusing, though relatively simpler terms known as little / big endian. The term came from famous Gulliver's Travels by J. Swift where two groups could not decide upon which way an egg should be broken - the little end or the big end !!


* So, what actually does little /  big endian refer to ?
  Simply stated, these are the terms used to refer to the byte-order of a particular system. By byte-order, we mean the way in which data is stored in Memory.
  Let us move ahead with simple examples - how is an integer stored in memory with respect to big endian systems : 


  unsigned int example = 255; //0xFF
  unsigned char *pTemp;
  char i = 0;
  pTemp = (char*)&example;
  for (i=0; i < sizeof(int); i++) {
    printf("%x\t", pTemp[i]);
  }

  
  Running the above on a big-endian system and assuming integer to be 4 bytes following is the output we get :


0 0 0 ff

  
  Explanation: Above code snippet is printing out the integer value stored in memory on per-byte basis. We have an integer 'example' with value 0xFF(255) which according to our assumption is consuming 4 bytes of memory and then we create a character pointer pTemp which takes up single byte of memory.

  Now, using the statement pTemp = (char*)&example  we point the character pointer to the base address of integer('example'). So, if the base-address of the int variable is 0x100 (hypothetical value), the data for integer will span across memory locations form 0x100 to 0x103. And the character pointer is also pointing to address 0x100 and will span only single byte.

  Now, using the while loop, we are iterating over all the 4 bytes and printing them out one by one. The result shown above thus prints out the bytes from address 0x100 to 0x103.

  What we infer from the contents of bytes shown is that the MSB for the integer is stored in higher memory location(0x103) and LSB will be stored in lower memory location(0x100)

 
  However, things changes drastically if we run the above code on little endian system. Here is what we get :
  

ff 0 0 0

  
  Everything gets reversed, i.e. MSB for the integer is stored in lower memory location (0x100) and LSB is stored in higher memory location (0x103). Well, this is the only difference that is there in byte-order of little and big endian systems. Its the storage of data that is different in these two location.

Note: Off course, this byte-ordering difference comes into picture for data-types that are multi-bytes, e.g. for data-type 'char' the representation is same irrespective of the system.


* One more example :

  unsigned int example = 0x12345678;
  unsigned char *pTemp;
  char i = 0;
  pTemp = (char*)&example;
  for (i=0; i < sizeof(int); i++) {
    printf("%x\t", pTemp[i]);
  }

  
  And the output would be :

  

big endian    :  12 34 56 78
little endian  :   78 56 34 12

EduSagar - Little and Big Endian



* I want to know the endian-ness of my system ?
  The following code does exactly the same what you asked for :
  

char * getEndianNess(void)
  {
    unsigned int example = 1;
    char *pTemp = (char*)&example;
 
    return ((*pTemp) ? "little endian" :  "big endian");
  }

  
 
* What special consideration I need to make while writing code to handle endian-ness ?
  Most of the times, the endian-ness of a system doesnt affect the developer as the compiler will take care of the things for us.
  e.g. x = 0x00FF, y = 0xFF00 Here, if we do an OR operation between these two values the result would be 0xFFFF as the byte order representation will be taken care by compliers and the result will be stored accordingly in memory in the same order and thus while fetching the result from the memory, the result would be exactly what we expected.
 
  But, there are situations where we need special handling for endian-ness of a system. Most common is while writing to a network device. Internet Protocol uses big-endian as their default network byte order, and thus developers on little-endian systems needs to first translate it to proper byte-order. Berkeley Socket API provides functions htonl (host-to-network-long) and htons (host-to-network-short) for 32-bit and 16-bit values respectively for conversion of machine-specific byte-order to network-specific byte-order. These functions internally takes care of both little and big endian systems.

  Sometimes, writing and reading from files that are opened/read in binary modes have endian-ness issues. So, care must be taken to do byte-order conversions properly before reading and writing data.
 
 
* Bi-endian systems:
  Some systems provide support for both big and little endian-ness and are appropriately termed as Bi-endian systems.
 
* Examples :
  Little Endian: Intel based processors x86, older version of ARM processors
  Big Endian: Motorola 6800, 68K, Xilinx, older versions of SPARC (till version 9)
  Bi-Endian: SPARC V9, ARM, PowerPC


- Pankaj

Happy Programming !!

comments powered by Disqus