/osfs

a Python package for analyzing structural properties of trees in filesystems as well as practical metadata

Primary LanguagePython

This package evaluates graph (specifically tree) properties on filesystems. Non-structural node properties, such as the metadata from stat, is also included.

There are theoretical questions as to how efficiently certain graph properties can be calculated in both time and space. For example, the minimum depth subtree of a subtree is simply the descendent which is minimum in depth. In that case, the order statistics should be evaluated from the bottom-up and so there are only O(n) evaluations required, rather than O(n^2) if one evaluates naively the order statistic for each subtree independently. There are also considerations for space complexity. For example, some properties can be evaluated by walking the file system and maintaining a dynamic data structure whose space requirements scales less than O(n). Obviously anything which is a node property only requires O(1) memory, but perhaps even some structure properties require less memory. Because the distribution is a thing of interest, though, many questions of efficiency are not relevant, and one can always do the easy way of dynamic programming: save subproblems by some way like a hashmap and check when doing any subproblem if it has already been solved.

On a Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz, the entire filesystem (starting at /) can be analyzed in about 20 seconds when the C program is used to output the lstat and stat information.