I/O redirection using dup() system call

* What is a dup() system call
system call in unix systems copies a file descriptor into the first free slot in the private file descriptor table and then returns the new file descriptor to the user. It works for all the file types. The syntax is :

newfd = dup(fd);

Here fd is the file descriptor being duped and newfd is returned to the user.

There are basically three different data structures that helps in manipulation of file system. These are - the inode table, private user file descriptor table and the global file table. Before moving forward to the description of dup() command, I urge you to please follow this article on Internal Data Structure for file handling in Unix kernel.

dup() system call doesnt create a separate entry in the global file table like the open() system call, instead it just increments the count field of the entry pointed to by the given input file descriptor in the global file table. Consider an example where fd 0, 1 and 2 are by default engaged to the standard input/output and error. Then if the user opens a file "/var/file1" (fd - 3), then he opens file "/var/file2" (fd - 4) and again he opened "/var/file1" (fd - 5). And now, if he does a dup(3), kernel would follow the pointer from the user file descriptive table for the fd entry '3', and increments the count value in the global file table. Then, it searches for the next avaialable free entry in file descriptor table and returns that value to the user (6 in this case).

EduSagar - dup() system call

* Difference between open and dup system call

#include "fcntl.h"
int main()
 int i,j;
 char buf1[512],buf2[512];
 i = open("/var/file1", O_RDONLY);
 j = dup(i);
 read(i, buf1, sizeof(buf1));
 read(j, buf2, sizeof(buf2));
 read(j, buf2, sizeof(buf2));
 return 0;

In the above program after doing the dup(i), both i and j point to the same entry in the global file table and thus share the same byte offset in the file. Thus the next two read operation will read different data and buf1 and buf2 will have different data. This is where the difference comes into picture when compared to the open() system call. If instead of dup(), we use open() call to open the same file again, it will create a separate entry in the global file table and thus separate byte offsets for the files opened. The effect is same content in bufferes buf1 and buf2. The user can close either of the file descriptors and continue using the other without any issue.

Follow this article for the description of open() system call.

* Input / output redirection using dup() system call
system call finds use in implementing input/output redirection or piping the output on unix shell. Suppose, we wish to redirect the output of 'ls' command to a file, we use the following command on shell to do our job:

 root> ls /var/* > tempfile

File descriptor 1 is bound to the standard output stream. The 'ls /var/*' command is supposed to output the data on this output stream i.e. 1. But, using '>' operator we are able to redirect this output to file 'tempfile'. What happens when the process that is executing the shell here is that it parses the command and when it finds '>' operator, it will first find the file descriptor of the rhs operand - 'tempfile' OR create the new fd if file doesnt exist already. Once, it finds this fd, it will close the stdout file descriptor and call a dup() on the given fd for this 'tempfile'.
Thats it, from this step onwards, the output will be redirected to the file 'tempfile'. We can also do an additional step of closing the file descriptor to preserve the number of descriptors.


/*redirection of I/O*/
  fd = creat('tempfile', flags);
  close(stdout);  //stdout => 1
  /* stdout is now redirected */

The same logic is applied when we apply "pipe" operations on the shell. Thus, although dup() is not an elegant command but yet it is a powerful building block for several higher level commands.

Reference: The Design of the UNIX Operating System - by Maurice J. Bach

Happy Programming !!

comments powered by Disqus