Inode - file structure on Unix
Internal representation of a file is called an "inode" (contraction of term - index node) which contains all the required information and description of the file data and its layout on disk. This article deals in detail with the information stored in inode and the way it is represented in the kernel.
Inodes resides on the disk and the kernel reads them into an into memory which we can call as in-core inodes. Disk inodes contains the following information:
- File access permissions and time (last access / modified etc.)
- File ownership information.
- Type of the file (regular / directory / block special / pipe)
- Number of links to the file
- File size and organization on disk (the file data may spread across several different and far-spaced disk location)
The In-core copy of inodes contains all of the above information, but it also contains the following additional information:
- Status (locked / process is waiting for it to become unlocked / in-core copy has been modified and thus differs from the copy on the disk / mounted)
- Logical Device number of the file system
- Inode Number. Since inodes are stored in a sequential manner on the disk, the kernel uses an identifier of that array to refer to its in-core copy.
- Pointer to other in-core inodes. Kernel maintains a hash queue of inodes according to the logical device number and the inode numbers. Kernel also maintains a list of free inodes.
- Reference count which indicates the number of instances of the file that are currently active.
* Structure of a Regular file on Disk
As stated previously inodes contains the table of content of the file data on disk. As each disk block can be referenced by a number, the table of content is nothing but a sequence of disk block numbers. The file data may not always be stored in contiguous memory locations, hence we will need to keep track of all the block numbers on the disk. The system V Unix systems have the following 13 entries as the table of contents :
The blocks marked "direct" can refer to a single disk block that contains the real data. "single indirect" block contains the number of a disk block which in itself contains a list of block numbers that we can reference and they have the real data. Going on the same line, we have "double indirect" and "triple indirect" blocks. Lets now try to get an estimate of the max limit on the size of a file that Unix File System can handle.
Assume one block is of 1 Kbytes and a block number is an integer of 4 bytes (32 bits). Thus a block can have 256 block numbers.
1 single indirect block with 256 block number entries => 256 K bytes
1 double indirect block with 256 single indirect entries => 64 M bytes
1 triple indirect block with 256 double indirect entries => 16 G bytes
which is far more then what the 4-byte memory address can handle (2^32 => 4 G bytes).
So, whenever a process wants to access any particular offset in a file, it will simply use this table of indexes and thus would load the appropriate disk block into memory.
* Structure of directory
Directories are the files that give the file system its hierarchical structure. A directory on Unix file systems is a file which contains a sequence of entries where each entry contains an inode number and the name of the file contained in the directory. UNIX System V restricts name to a maximum of 14 characters, and 2 byte entry for the inode number making it 16 bytes per entry.
File names 0
So, that was an insight into the inode data-structure. Follow this link for an understanding of the interaction between different data structure that link up with inodes.
Reference: The Design of the UNIX Operating System - by Maurice J. Bach
Happy Programming !!