shundhammer/qdirstat

File Age Statistics

shundhammer opened this issue · 5 comments

Background

This was inspired by issue #165 : File histogram view. It's not exactly the same, but it's in the same spirit.

General Idea

Show the age of files in a selected directory tree; not with the exact time stamp, but more roughly to get a better idea of the approximate age of files. In this current application, it was year-based; that gives a good overview when it comes to decisions such as what to move to archive media or to delete and what to keep.

Implementation

Show a list of years and how many files were last modified in that year.

For each year, show the absolute number of files, the percentage of files in that directory tree in that year, as well as the total size of all those files and their percentage.

Example

QDirStat-file-age-stats-work-photos

This is my /work/photos directory where I store the photos that I shot over the years. Beside the main window that shows all the total values, the File Age Statistics window is opened (menu View -> File Age Statistics or F4) that breaks down all those files into all the years.

Notice that this is strictly about the file modification time (mtime). It only uses the year part of that mtime time stamp (that is really a time_t, i.e. seconds since 1970-01-01 00:00:00 like all Linux/Unix time stamps).

Drilling Down

In issue #165 there was a beginning discussion how to get further from that very rough overview information. Knowing that some files go back as early as 2003 may be interesting, but there should be an easy way to find out where they are.

Oldest and Newest Files: The new Discover Actions

For the oldest and the newest files, there is now an easy way: Use Discover -> Oldest Files or Discover -> Newest Files. But what about files in some other year range?

Just Leave the Window Open and Click Around

This new File Age Statistics window can simply remain there while you get busy in the main window. As you select a different directory, it is automatically being updated with the year statistics from that different directory.

It's Persistent

This is the default behaviour; you can change that with the Sync with Main Window check box. Just uncheck it, and it will retain its current content, even if you click on another directory in the main window. That setting is saved to the config file and will be as you left it when you open that window the next time after restarting QDirStat.

More Detailed Example

We already saw this:

QDirStat-file-age-stats-work-photos

Selecting a subdirectory in the main window (with the mouse or with the cursor keys):

QDirStat-file-age-stats-work-photos-orig

Selecting one directory deeper down:

QDirStat-file-age-stats-work-photos-orig-dvd01

Notice how the years from the current year on are displayed even if no file in that subtree was modified in any of them. Those gaps are displayed dimmed out. But they still give you an important piece of information: It's been a while since anything was changed there. It's old stuff.

Even though that behaviour can be configured to only start at the active years, i.e. from the first year on that has a changed file in that subtree, I found it a lot more intuitive to see even the gap at the start: Since the relevant part starts further down, i.e. further in the past, you instantly know that it's in the past even without reading the year numbers. It's location-encoding, i.e. the location of the information (further down) is already a piece of information that your brain can instantly process even without reading.

Gaps between active years are always displayed; in the beginning I found it very confusing to not see at a glance if there were any years in a long list without activity. Empty space helps: You see instantly that there was a period with no activity.

Let's switch to the photo directory where I keep photos about beer hikes over the years:

QDirStat-file-age-stats-work-photos-sort-beer-hikes

Again, even without studying all the numbers, you see that it's been two years ago since the last one (Corona got in the way in 2020 and 2021), then there were some years without one, after that one some more years without, then some years with fairly regular activity.

You can do some rough data analysis by just glancing at the table. That is useful when you move around in the main window with the cursor keys rapidly; it's quite easy to get a feeling about a lot of data that way.

Finally, business trips to Finland:

QDirStat-file-age-stats-work-photos-sort-travel-finland

I didn't have much time for taking photos on those trips, so it's not many of them; but again, you can see a pattern: Two occasions, in 2009 and again in 2015.

Future Development

This is just a first shot at those File Age Statistics. So far, it still has some rough edges; some refinement will follow in the near future.

But it's already usable, and it should not have broken any existing features. Hopefully. ;-)

Monthly Statistics for the Last Months

Now also monthly file age statistics are available for the current year and for the last year. By default. this is collapsed:

QDirStat-file-age-stats

Clicking on the little arrow near the current year (2021) shows the months up to the current month:

QDirStat-file-age-stats-last-years-months

With also the last year expanded:

QDirStat-file-age-stats-last-2-years-months

Notice that this is limited to the current and the last year, no matter when activity in this directory tree begins (i.e. even if there is no entry for 2021 and 2020, only for 2012 and earlier).

The rationale is that it may really be important to know when anything changed during the last few months, but the further back any activity was, the less important is the exact month (at least in this context).

Discussions are welcome. Feel free to discuss this here.