The C Programming Language - Structures and Union

Edusagar - notes - Structures and Union
  • A structure member/tag can have the same name as that of a non-member variable without any confilict as they can be distinguished by their context.


    struct point {
    	int x;
    	int y;
    }
  • It is just a declaration of structure with no actual memory allocated to it. It just serves as a template with tag - "point", which can later be used to create instaces of structure e.g.

    struct point pt

    The above defines a variable pt with actual memory allocation. Initialization can be done as follows:

    struct point pt = {32, 34};

  • Structures can be nested. For example:


    struct rectangle {
    	struct point pt1;
    	struct point pt2;
    } screen;

    Here screen is a structure variable with two member variables each of type structure - point which itself contains two members each of type integer. Thus all the four members can be listed as:

    screen.pt1.x, scrren.pt1.y, screen.pt2.x and screen.pt2.y

  • The only legal operations on a structure are copying it or assigning to it as a unit, taking its address with &, and accessing its members. Structures may not be compared. A structure may be initialized by a list of constant member values; an automatic structure may also be initialized by an assignment.

  • /* addpoints: add two points */
    struct point addpoint(struct point p1, struct point p2)
    {
    	p1.x += p2.x;
    	p1.y += p2.y;
    	return p1;
    }

    Here both the arguments and return type are structures. Also, we are directly modifying p1 and returning it back to the caller. This doesn't modify the value of p1 in the caller as p1 and p2 arguments are also called by value and not by reference.


    struct point origin, *pp;
    pp = &origin;
    
    printf("origin is (%d,%d)\n", (*pp).x, pp->y);  // (*pp).x and pp->y both are valid deferences.
  • The structure operators . and ->, along with () for functions and [] for array subscript are at the top of the precedence table. For example:

    struct {
    	int len;
    	char *str;
    } *p;   // p is a pointer to the structure
    
    ++p->len;   // increments len and not p 
    (++p)->len; // increments p before accessing len
    (p++)->len; // increments p afterwards , hence parantheses are unneccessary
    *p->str     // first character of string
    *p->str++   // increments str after it access whatever it points to (same as *s++) 
    (*p->str)++ // increments first character of the string
    *p++->str   // increments p after accessing string
  • #define NKEYS sizeof(array)/sizeof(array[0]) Gives us the number of elements in the array.

  • struct key keytab[] = {.....};  // an array of struct key
    struct key *low = &keytab[0];
    struct key *high = &keytab[n];  //  n points to the length of the array
    struct key *mid;
    
    mid = (low+high)/2  // WRONG , as two pointers cannot be added 
    mid = low + (high-low)/2   // CORRECT, as subtraction is legal and addition of a pointer to an integer is valid too
  • &arr[-1] is strictly illegal, and it is illegal to dereference &arr[n]. The language definition does guarantee, however, that pointer arithmetic that involves the first element beyond the end of an array (that is, &arr[n]) will work correctly.

  • Notice the changes to binsearch if instead of integer indexes we want work on pointers.


    /* binsearch: find word in tab[0]...tab[n-1] */
    struct key *binsearch(char *word, struck key *tab, int n)
    {
    	int cond;
    	struct key *low = &tab[0];    
    	struct key *high = &tab[n];    // using n instead of n-1 is legal.
    	struct key *mid;
    	while (low < high) {           // with pointers we cannot do low<=high, since we might hit &tab[-1] case.
    		mid = low + (high-low) / 2;    
    		if ((cond = strcmp(word, mid->word)) < 0)
    			high = mid;           // there is no mid-1
    		else if (cond > 0)
    			low = mid + 1;       // same as before
    		else
    			return mid;
    	}
    	return NULL;
    }
  • struct {
    	char c;
    	int i;
    };

    Size of this structure would be 8 and not 5. This phenomenon is known as structure padding. This happens because 3 extra bytes will be added soon after member 'c', since for a 32-bit system it is best if the data is aligned on 32-bit boundaries.

  • when a function returns a complicated type like a structure pointer, as in -

    struct key *binsearch(char *word, struct key *tab, int n)

    the function name can be hard to see, and to find with a text editor. Accordingly an alternate style is sometimes used:

    struct key *

    binsearch(char *word, struct key *tab, int n)

    This is a matter of personal taste; pick the form you like and hold to it.

  • It is illegal for a structure to contain an instance of itself, but you can have a pointer to the same structure as a member variable.


    struct tnode {
    	char *word;
    	int count;
    	struct tnode *left;
    	struct tnode *right;
    }
    struct tnode *talloc(void);
    char *strdup(char *);
    
    /* addtree: add a node with w, at or below p */
    struct treenode *addtree(struct tnode *p, char *w)
    {
    	if (p == NULL) { // new word has come
    		p = talloc();
    		p->word = strdup(w);
    		p->count = 1;
    		p->left = p->right = NULL;
    		return p;
    	} 
    	
    	cond = strcmp(w, p->word);
    	if (cond == 0) {
    		// word already exists, increment count
    		p->count++;
    	} else if (cond < 0) {
    		p->left = addtree(p->left, w);
    	} else {
    		p->right = addtree(p->right, w);
    	}
    	
    	return p;
    }
    		
    /* treeprint: in-order print of tree */
    void treeprint(struct tnode *p) 
    {
    	if (p != NULL) {
    		treeprint(p->left);
    		printf("%4d %s\n", p->count, p->word);
    		treeprint(p->right);
    	}
    }
    
    
    /*talloc: make a tnode */
    struct tnode *talloc(void) 
    {
    	// malloc return void*, which needs to type casted to proper type
    	return (struct tnode *)malloc(sizeof(struct tnode));
    }
  • /* make a duplicate of string s */
    char *strdup(char *s) 
    {
    	char *p;
    	p = (char*)malloc(strlen(s)+1); // +1 for '\0'
    	if (p != NULL) {
    		strcpy(p, s);
    	}
    	
    	return p;
    }
  • Hash Table:

    Hash table lookup for symblo table sort of data using structures:


    struct nlist {          /* table entry: */
    	struct nlist *next;     /* next entry in chain */
    	char *name;             /* defined name */
    	char *defn;             /* replacement text */
    };

    An array of pointers is used to contain the list of symbols mapping to a particular hash value.


    #define HASHSIZE 101
    static struct nlist *hashtab[HASHSIZE];   /* pointer table */
    
    /* hash: form hash value for string s, unsigned values ensure non-negative hash values */
    unsigned hash(char *s)  
    {
    	unsigned hashval;
    	for (hashval = 0; *s != '\0'; s++)
    		hashval	 = *s + 31 * hashval;
    	return hashval % HASHSIZE;
    }
    
    /* lookup: look for s in hashtab */
    struct nlist *lookup(char *s) 
    {
    	struct nlist *np;
    	
    	for (np = hashtab[hash(s)]; np != NULL; np = np->next) {
    		if (strcmp(np->name, s) == 0) {
    			return np;
    		}
    	}
    	
    	return NULL;
    }
  • /* install: put (name, defn) in hashtab if entry is present, else add a new entry, return NULL is space is not available */
    struct nlist *install(char *name, char *defn)
    {
    	struct nlist *np;
    	unsigned hashval;
    
    	if ((np = lookup(name)) == NULL) {
    		/* not found, add new entry */
    		np = (struct nlist*)malloc(sizeof(struct nlist));
    		if (np == NULL || (np->name = strdup(name)) == NULL)
    			return NULL;
    		
    		hashval = hash(name);
    		np->next = hashtab[hashval];
    		hashtab[hashval] = np;
    	} else {
    		free((void*)np->defn);
    	}
    	if ((np->defn = strdup(defn)) == NULL) {
    		return NULL;
    	}
    	
    	return np;
    }
  • C provides a facility called typedef for creating new data type names. For example, the declaration:

    typedef int Length;

    makes the name Length a synonym for int. The type Length can be used in declarations, casts, etc., in exactly the same ways that the int type can be:

    Length len, maxlen;

    Length *lengths[];

  • Typedef doesn't create a new type in any sense; it merely adds a new name for some existing data types. In effect, typedef is like #define, except that since it is interpreted by the compiler, it can cope with textual substitutions that are beyond the capabilities of the preprocessor. For example,

    typedef int (*PFI)(char *, char *);

    creates the type PFI, for "pointer to function (of two char * arguments) returning int" which can be used in declaring function pointers like:

    PFI strcmp, numcmp;

  • If typedefs are used for data types that may be machine-dependent, only the typedefs need change when the program is moved. One common situation is to use typedef names for various integer quantities, then make an appropriate set of choices of short, int, and long for each host machine. Types like size_t and ptrdiff_t from the standard library are examples.

  • A union is a variable that may hold (at different times) objects of different types and sizes, with the compiler keeping track of size and alignment requirements. Unions provide a way to manipulate different kinds of data in a single area of storage, without embedding any machine-dependent information in the program.

  • struct {
    	char *name;
    	int flags;
    	int utype;
    	union u_tag {
    		int ival
    		float fval;
    		char *sval;
    	} u;
    } symtab[NSYM];

    The variable 'u' here is large enough to hold the largest of the three types; specific size is implementation-dependent. The first character of sval can be referred as symtab[i].u.sval[0];

  • The same operations are permitted on unions as on structures: assignment to or copying as a unit, taking the address, and accessing a member. A union may only be initialized with a value of the type of its first member; thus union u described above can only be initialized with an integer value.

  • A bit-field, or field for short, is a set of adjacent bits within a single implementation-defined storage unit that we will call a word e.g.


    struct {
    	unsigned int is_keyword: 1;   
    	unsigned int is_extern: 1;
    	unsigned int is_static: 1;
    } flags;
  • Fields must be declared as int(for portability) along with the number of bits they occupy. These fields behave like small integers, and may participate in arithmetic expressions just like integers.

  • Almost everything about bit-fields is implementation dependent. Fields can also be unname fields(a colon and width only) which are used for padding. Special bit width 0 is used to force alignment at the next word boundary.

  • Fields can assigned from left-to-right or vice-versa depending on machine architecture and hence not portable. Fields are not arrays and hence we cant fetch their address using & operator.

comments powered by Disqus