conaticus/FileExplorer

Use `enum FileType` instead of plain strings for `CachedPath::file_type`

phoenix-ru opened this issue · 0 comments

I was excited to check the project sources after the second YT video, especially because many people commented on the inefficiency of the cache strategy.

I found multiple improvement points in terms of memory optimization. One of them is here:

file_type: String,

When I checked the usages, I quickly discovered that:

pub const DIRECTORY: &str = "directory";
pub const FILE: &str = "file";

if file_type == "file" {

It looks like you're using plain strings in both cache and in comparisons, which is not optimal. With this issue I suggest changing String to an enum. Thankfully, serde works well with enums:
https://serde.rs/enum-representations.html

Now, to efficiently serialize/deserialize, you need a https://github.com/dtolnay/serde-repr crate:

use serde_repr::*;

#[derive(Serialize_repr, Deserialize_repr, PartialEq)]
#[repr(u8)]
pub enum FileType {
    File,
    Directory
}

For the reference, I compared the two wall-clock times and cache sizes of different implementations.
Disk space indexed: 273 GB.
OS: Fedora 38.
Disk: SK Hynix NVMe.

  • Previous implementation:
    • 26.5 seconds from cold launch,
    • 12 seconds warm,
    • 68.8 MB of cache;
  • repr(u8) implementation:
    • 21 seconds cold,
    • 11 seconds warm,
    • 68.2 MB of cache.

It seems that the performance of an indexer is bottlenecked by the filesystem. What I don't understand though, is why the warm start takes so long? There seems to be the cache re-evaluation somehow.

P.S. It also would be great to use std::fs::FileType instead of a custom enum, but the custom enum seems to work just fine.