i2mint/py2store

Base operations for file (local or cloud) systems

Opened this issue · 0 comments

This is seeded by one of the "Big Tasks" (#22).

It has to do with pin pointing what the common (and most commonly used) operations for a type of storage systems are.

In this case, I'm looking at file-like systems. Systems like local or remote files, s3, dropbox, etc.

It would be good to pinpoint these base operations, and use this consistency to implement a consistent interface for all of them (using python's builtins language, as much as it makes sense). Doing so will not only allow a user to be able to talk to many different systems without having to learn a new vocabulary every time, but also enable more code reuse when the "business logic" is the same.

See for example, the iter_filepaths_in_folder_recursively method below. As long as the FileAccess object knows how to list full paths (paths_in_dir) and determine if a path points to an existing file or directory (isfile and isdir), it can perform the iter_filepaths_in_folder_recursively function.

The example below is to illustrate what a FileAccess class would look like, if written for local files in particular. Obviously, we should have an abstract FileAccess that is subclassed for particular systems.

Also, I'm making no claims as to whether my choice of methods is the right one here.

import os
from glob import iglob


class FileAccess:
    sep = os.path.sep  # TODO: should we emulate os and have a .path.sep, .path.isfile and .path.isdir?
    isfile = staticmethod(os.path.isfile)
    isdir = staticmethod(os.path.isdir)
    getsize = staticmethod(os.path.getsize)
    listdir = staticmethod(os.listdir)
    getsize = staticmethod(os.getsize)
    walk = staticmethod(os.walk)
    rm = staticmethod(os.remove)
    rmdir = staticmethod(os.rmdir)
    stat = staticmethod(os.stat)  # TODO: this is an os.stat_result object, so need to make it a dict or agnositic obj
    
    def _ensure_slash_suffix(self, dirpath):
        if not dirpath.endswith(self.sep):
            return dirpath + self.sep
        else:
            return dirpath
    
    def paths_in_dir(self, dirpath):
        return iglob(self._ensure_slash_suffix(dirpath) + '*')
        
    def filepaths_in_dir(self, dirpath):
        return filter(self.isfile, self.paths_in_dir(dirpath))

    def dirpaths_in_dir(self, dirpath):
        return filter(self.isdir, self.paths_in_dir(dirpath))

    def paths_in_dir_with_slash_suffix_for_dirs(self, dirpath):
        for path in self.paths_in_dir(dirpath):
            if self.isdir(path):
                yield self._ensure_slash_suffix(path)
            else:
                yield path

    def iter_filepaths_in_folder_recursively(self, dirpath):
        for path in self.paths_in_dir(dirpath):
            if self.isdir(path):
                for _path in self.iter_filepaths_in_folder_recursively(path):
                    yield _path
            else:
                if self.isfile(path):
                    yield path

def test_file_access():
    facc = FileAccess()
    rootdir = os.path.expanduser('~/Downloads/')
    paths = list(facc.iter_filepaths_in_folder_recursively(rootdir))
    assert all(map(lambda path: 'Downloads' in path, paths))
    assert len(paths) > 0
    os.stat(paths[0])  # try to get info about the file```