lazka/hypothesis-fspaths

fspaths appears to generate invalid paths

Opened this issue · 2 comments

Maybe it's not clear to me how to use this strategy, but when I try, I'm getting paths that aren't acceptable by open or pathlib.Path. Example:

    @given(fspaths())
    def test_fs_paths(path):
        with open(path, 'w') as p:
            pass
        os.remove(path)

results in:

test_fs_paths (tests.test_porerefiner.TestCoreFunctions) ... Falsifying example: test_fs_paths(self=<tests.test_porerefiner.TestCoreFunctions testMethod=test_fs_paths>, path=b'\x80')
Traceback (most recent call last):
  File "/Users/justin.payne/scripts/dev/porerefiner/tests/test_porerefiner.py", line 56, in test_fs_paths
    with open(path, 'w') as p:
OSError: [Errno 92] Illegal byte sequence: b'\x80'

Falsifying example: test_fs_paths(self=<tests.test_porerefiner.TestCoreFunctions testMethod=test_fs_paths>, path=b'')
Traceback (most recent call last):
  File "/Users/justin.payne/scripts/dev/porerefiner/tests/test_porerefiner.py", line 56, in test_fs_paths
    with open(path, 'w') as p:
FileNotFoundError: [Errno 2] No such file or directory: b''

Similarly:

    @given(fspaths())
    def test_fs_paths(path):
        pathlib.Path(path)

test_fs_paths (tests.test_porerefiner.TestCoreFunctions) ... Falsifying example: test_fs_paths(self=<tests.test_porerefiner.TestCoreFunctions testMethod=test_fs_paths>, path=b'')
TypeError: argument should be a str object or an os.PathLike object returning str, not <class 'bytes'>

I think Windows permits Unicode characters in paths, so the first failure may very well be platform-dependent and I guess if that's the case, I should usefully handle it (somehow; I'm open to suggestions!)

But I'm genuinely not sure that an empty bytes is a valid path on any filesystem. Should this be one of the test cases?

Thanks for reading this, and for the library; again it's more likely I'm simply using it incorrectly (or not thinking through the complete spectrum of cases I should handle.) I'd appreciate any tips or clarification.

lazka commented

Hey, yeah, this strategy is very specific for my use case, and turns out probably not what most people want. As stated in the README it generates paths that the builtin open() accepts, in that it converts the value to the operating system representation and passes it to system API.

In the first two cases the operating system rejects them, not open(). In the third case, pathlib.Path() doesn't work with bytes while every other filesystem API in Python does, so it fails too.

I've been meaning to extend this by adding more strategies for paths that are valid according to the operating system in use and existing paths on the filesystem, but haven't found the time/motivation.

What did you expect from this lib?

Thanks for the reply!

I guess I'm using hypothesis in order to generate more test cases than just what I can anticipate; so in that respect I guess I'm getting exactly what I asked for. :D

Overall I think I'm getting the behavior I'm looking for just by using Hypothesis's filtering tools to exclude values I don't think I'll ever get, so there probably isn't a fix necessary, at least not on my account. I'm using it in an application where we store filesystem paths in a database, and I've been occasionally burned by the difference in text encodings between what the database and ORM will support and what's actually a valid path from the filesystem and fspaths is helping me think that through.

I guess if I had a wish list for series of "file paths" strategies, it might be:

A strategy for generating actually-existing paths
A strategy for generating paths that contain spaces
A strategy for generating paths that include Unicode
A strategy for generating temporary files with specified contents

That would capture a lot of the "file handling" test cases I can think of. As separate strategies I think a user would get more granular feedback about exactly how their code is failing the test.

Anyway, thanks for the discussion, I don't mind if you'd like to close the issue.