Unix File System internals
In Unix based file systems all the entities on disk are treated as a File. The internal representation of a file is called an "inode" (contraction of term - index node) which contains all the required information and description of the file data and its layout on disk. In this article we would build an understanding of this inode structure and also the way the files are organized on the disk.
* Basic Building Blocks in File Systems
- Inode. Its a complex data-structure that contains all the necessary information to specify a file. It includes the memory layout of the file on disk, file permissions, access time, number of different links to the file etc.
- Global File table. It contains information that is global to the kernel e.g. the byte offset in the file where the user's next read/write will start and the access rights allowed to the opening process.
- Process File Descriptor table. It is local to every process and contains information like the identifiers of the files opened by the process. Whenever, a process creates a file, it gets an index from this table primarily known as File Descriptor.
- Disk Blocks. These are sequence of logical blocks of a particular size e.g. 512, 1024, 2048 bytes etc. This is the amount of data that would be accessed in one single disk read/write.
- Boot block. Its the first sector/block in a file system and typically contains the bootstrap code needed to boot the system.
- Super block. It specifies the state of a file system like the size of the system, how many files it can store, location of free space on the file system etc.
* The Buffer System for disk blocks
The simplicity of Unix file systems lies in the usage of buffer pool for manipulation of disk data. Kernel never directly goes on to read/write from the disk. It first allocates an in memory buffer and then does the data manipulation. The efficiency comes from the way these buffers are organized and the internal algorithms used to perform data manipulation.
During system initialization, the kernel allocates the space for a sufficiently large number of buffers. The data in a buffer is actually a mapping of the data on the disk blocks. In other words, a buffer is the in-memory copy of the data on the disk. However, the mapping is temporary until the kernel re-maps the buffer to some other data on the disk. A disk block can only map to one single buffer, otherwise it would be hard for kernel to figure out which buffer contains the current data.
Buffers are arranged in the form of a hash queues with one header which helps in organizing the buffer list based on the device and block numbers to have a simple and fast search mechanism. Every buffer contains the state of the buffer (locked / valid data / free etc). Kernel also maintains a free list of buffers which uses the "least recently used" order. Free list is a circular doubly link list with a dummy head marking the start/end of the list. Every buffer is put on the free list when the system is booted. Kernel takes a buffer from the head of the free list when it need a particular block in the buffer pool. It then removes the buffer from the free list. When the kernel frees a buffer it attaches it to the tail of the free list.
There are two different entities when we refer to the term inode. One is the inode structure that lies statically on the disk which contains all the information related to the layout, user permissions, access time etc. And the other one is the inode structure kept in memory for faster access similar to what we do for the disk buffers. Whenever a process refers to a file by name, kernel parses the file name one component at a time, checks whether the process has a permission to search the directories in the path and eventually retrieves the inode for the file.
Follow this article for a detailed and useful explanation of the inode structure.
* Super Block
It contains the following information:
- size of the file system
- number of free blocks in the file system + the list of free blocks
- index of the next free block
- size of the inode list
- number of free inodes + list of free inodes
- flag indicating that the super block has been modified
- lock fields for the free block and free inode lists
The super block effectively handles the allocation of inode to files. Consider the case when a new file is created and thus we need a new inode entry for this. Inode list is present on the disk in the form of a sequential array, however searching this array and returning a free inode requires disk access. To avoid this access, super block contains a cache of free inodes in the form of a list. If the list of inode numbers in the super block is not empty, the kernel will assign the next free inode number to the newly created file. Then it will copy the disk inode to the in-core copy, initialize the fields and returns the locked inode. Then it will update the disk copy of the inode to indicate that it is now in use.
However, if the list of free inodes is empty, kernel searches the disk and places as many free inode numbers as possible into the super block. It reads the inode list on disk block by block and fill in the free list in super block. It does some book-keeping to keep the process highly efficient for the next such operation.
Super block also contains an array that is used to cache the number of free disk blocks in the file system. The utility mkfs(make file system) organizes the data blocks in a linked list so that whenever required the kernel can effectively search for the block number that is available next.
* Interacation of Inodes + File Descriptor table + Global File Table
For every open() system call, the file descriptor table allocates a new index known as file descriptor (fd). Now, this entry points to an entry in the global file table which contains all the I/O byte offset related information. Now, this global file table has a pointer that points to the proper inode number in the inode table. These tables interact with each other in a complex manner and provides us many file handling system calls, which we can use to manipulate the data in the files.
Follow this link for a detailed understanding of this interaction with an example of open() system call.
With this I would conclude this article on the unix file system internals. Please leave your valuable comments.
Reference: The Design of the UNIX Operating System - by Maurice J. Bach
Happy Programming !!