File System Management

Installation and Execution

to run this you need to clone the directory and then run this inside the cloned directory:
    cd ./src
    make

then in order to execute the code you have to use the command ./shooter inside the src directory. the command has the format:

  ./shooter <option> <path to file system> [<filename>]

possible options :

  • /info : displays info about the file system.
  ./shooter /info <path to file system> 
  • /find : checks if a file exists in the filesystem recursively and returns its size in case it exists.
  ./shooter /find <path to file system> <filename>
  • /delete : checks if a file exists in the filesystem recursively and deletes it if it's found.
  ./shooter /delete <path to file system> <filename>

Explanation of file systems

this program only supports EXT 2 and FAT 16 file systems.

Ext 2

Ext2 (the second extended file system) is a file system for the Linux kernel. The data in this file system is divided into blocks, these blocks are then grouped into groups. The main structure that stores the files information and entries is known as inode (node index). each inode contains pointers to the blocks of the file system that stores the data of the file or directory.

The blocks and groups are structured as shown in the image below:

image of Ext 2 sample groups/blocks structure

the most important block is the superblock which is located in the first block of group 0 after the boot records. This block always starts on the offset 1024.

in the superblock you can find general information about the filesystem. For instance, you can find block sizes, total number of blocks, the number of inodes, or the number of the first inode.

The Inode table contains inode entries, each inode entry is 124 bytes. each of these entries contains the information of a file, directory, or symbolic link that this inode points to. these are the information stored in each inode:

image of the information stored in an inode

if the inode points to a directory, each directory entry will follow the following structure:

image of a directory entry

each block stores as many sequential directory entries as it can fit. If the entries don't fit in one block, then the reset of the entries will be stored in another block which you can locate using the inode representing this directory.

FAT 16

FAT (File Allocation Table) is a type of legacy file systems. FAT offers good performance but isn't very scalable and reliable. The 16 in the name refers to the number of bits used to identify each allocation unit (known as cluster).

Fat file systems are divided into 4 main regions as shown in the figure below: image of a the FAT regions

The reserved region contains the boot sector and BPB (BIOS Parameter Block). these blocks contain important information about the file system like the number FATs, the size of the cluster, or the number of entries in the root directory.

the root directory entry is located in the root directory region. each directory/file entry follows the same format as shown below: image of a the File structure

there is a limitation presented with this structure as it doesn't allow files with a name longer than 8 bytes (note: 3 bytes of the name are reserved for the extinction). However, this problem is solved by setting a flag indicating that, and the name is saved in another section.

the directory entries are consecutive, and the last entry has a 0x0 in the first character in the file's name.

Explanation of the project

This program checks and modifies the information of an Ext2, or a FAT16 file system. As explained before, you can show meta information about the file system, search for a file in the file system, or delete a specific file.

this is the class diagam of the project: image class diagram

the Directory manager is the class that controls the main functionality of the system. When you create an instance of this class you need to provide it with a path to the file system. Then, it will check that file system and create a FAT or an Ext2 instance dependently.

both files systems implement the abstract class "FileSystem."

the file reader class stores an fstream object and returns a pointer to it so other classes can use it.

the main functionalities implemented in this program are:

/info

this functionality will read the metadata of both Ext and FAT file systems. To be able to store this data I created a structure in both file system that holds the necessary data. in order to test the functionality of this option, I had to open the file system in a hex editor and then read the bytes in the offsets needed, then comparing the value with the values I read using the program I was able to determine the correctness of the results.

/find

this functionality allows the user to recursively search for a file in the file system. In order to do this I had to explore all the files in the root directory and all of it's sub-directories looking for the file. this functionality did not require any additional data structures.

In FAT16, I first go to the begging to the Root region and start reading the files in it following the file entry structure in FAT16. if the file had the directory flag set (the entry DIR_Attr is equal to 16), then I will move to the offset indicated by the entry DIR_FstClusLO. Moreover, as I make the function recursively I am able to run the same function but instead of having the root region as my starting offset I add the first cluster I read as the offset.

For Ext2, the process was a little different as Ext works with inodes. At first I navigated to the root inode (inode 2) and then I started looking at the blocks that this inode points to. in each of those blocks there are many file entries, so I check each file and if a file type indicated by the entry is set to Directory then I recursively look in that directory but this time feeding the function the directory inode as the starting inode (which is stored in the inode entry of the file)

in order to test the functionality of this option I mounted the file system and checked the file tree. After that I search for the different files in the file system. Moreover, I tried adding files inside some nested directories and look for them using the program.

/delete

in order to delete a file I had to first find it, after that depending on the file system I removed the file in a different method.

for FAT16 in order to remove a file I had to mark the entry by putting a 0xE5 at the first character of the name. this will result in ignoring this entry next time you read the file system.

However, in EXT2 in order to delete a file I had to increase the size of the preceding file entry to include the file in it. Moreover, if the file you want to delete was the first entry, then I had to copy the information of the following file into the current file, then repeat the same process to delete the second file by increasing the record length of the current file.

In order to test the functionality of this option, I tried deleting some files then mounting the file system to check that they are actually deleted. Moreover, in order to test the file deletion in Ext for files that used up more than one block, I used a file with lots of long-named-files which resulted in the inode needing 3 blocks to store this data. Then I tried deleting those file to make sure the algorithm worked correctly.

Observed problems

phase 1:
the main problem I faced with this phase is to find the attributes that I had to read. It was a very tedious task but I managed to find the offsets at the end.

phase 2:
the main problem I had in this phase is not considering directories. Which resulted in the program finding directories as if they were normal files.

phase 3:
In this phase I had a problem finding the directory given a specific offset in both Ext and FAT, as I had to do some calculations in order to locate the entry I need to check. I fixed this by looking more in the documentation of both file systems as it had the calculations that I needed to carry out in order to locate the specific entry.

phase 4:
In this phase I was able to find some issues that the solutions of the previous phases didn't account for. For example, in FAT I realised that I was not skipping the entries that started with 0x5E. Moreover, in Ext2 I was only checking files in the first block and disregarding the other blocks as I was not offsetting correctly.

temporal estimation

image time consumed in the practice

the first phase took me the longest out of all of them, but that is because it involved the designing of the classes and the initial set up of the project. After everything was set up I solving the phases didn't take much time. Moreover, phase 2 was the most challenging even if it wasn't the most time consuming. This is because I had to spend some time understanding how each of the file systems worked, and thus how I should approach the search process in each.

Explanation and assessment of GIT

I divided the git into 3 main branches: Main, FAT, and Ext. The FAT and Ext branches held the currently in development files for each practice. once I was done with each I created a pull request to update the main branch. Moreover, for each phase I created a branch to hold the final files of that specific phase.

Conclusions

This project was really interesting and fun. The technical aspect of the project was not very challenging, however, understanding how each file system works was the interesting part. I got to see and understand the differences between the file systems which made me realize the pros and cons of each of them.

Analyzing the use case of each file system and the drawbacks of each was interesting, as I got to see why some operating systems made the decisions that they have made in selecting one file system over the other. I, also, got to practice and work a little more with git which was interesting as I didn't use to work with multiple branches before.

It is unlikely that I would have to develop this in another project in the future. Nevertheless, I think that it is very important that I have learned this, as I now have a much deeper understanding of file systems and how the computer low level functionalities work. Moreover, even if I don't have to work with a FAT or EXT file systems, the structure of those file systems could be integrated into future projects.

Overall, the practice was a very good learning experience.