aai-institute/lakefs-spec

`ls()` should return fully-qualified paths with repo/ref

AdrianoKF opened this issue · 0 comments

When calling ls(), the paths for items returned in the result are not prefixed with the repository and ref (as is the case for the underlying API endpoint). However, this results in these paths not being able to be used in other lakefs-spec API calls, since they all expect a fully-qualified rpath (as validated by parse()).

Example (assume the repo contains a folder data, containing a single file 1.txt):

items = fs.ls("repo/main/data")
assert items[0]["name"] == "repo/main/data/1.txt"  # AssertionError!

Since ls() is used under the hood by the AbstractFileSystem base class for a variety of other operations (at least find(), walk(), glob(), but also get(..., recursive=True)), these are broken by extension as well (since they might either return incorrect data, or in the case of put() pass an unqualified path to info(), which fails the validation in parse()).

A possible solution is to prefix the items returned by lakeFS API with the repo and ref in ls() (a single-line fix). However, extra care needs to be taken to make sure this behavior works correctly with the directory listing cache.

Failing test case:

def test_ls(
    random_file_factory: RandomFileFactory,
    fs: LakeFSFileSystem,
    repository: str,
    temp_branch: str,
) -> None:
    random_file = random_file_factory.make()

    prefix = f"{repository}/{temp_branch}/find_{uuid.uuid4().hex.lower()[:6]}"
    fs.put(str(random_file), f"{prefix}/{random_file.name}")
    files = fs.ls(f"{prefix}/")

    assert len(files) == 1
    assert files[0]["name"] == f"{prefix}/{random_file.name}"