caddyserver/certmagic

Clarifications about storage interface

bitfehler opened this issue ยท 10 comments

What is your question?

Hi! I would like to ask some help around implementing the storage interface. The documentation states that the interface is that of a key-value store, but a key-value store does not have a concept of directories like the interface requires. Hence, this behavior has to be "emulated", but the documentation is sparse on details. Specifically, the List() implementation is tricky.

Consider the following set of keys:

foo/cert/key
foo/cert/chain
foobar/cert/key
foobar/cert/chain

What are the expectations in case of a call to List(prefix: "foo", recursive: false)? For a generic key-value store implementation, all keys satisfy the prefix requirement, but returning any of the foobar keys would not seem to make sense? Can this case occur? Will a prefix contain a trailing slash? Would it make sense to return the foobar keys if recursive were true? I found some answers (see below), but I am not sure what assumptions I can rely on.

What have you already tried?

I have read the filesystem implementation, and it seems that the expectation is that prefix is always an existing directory, as it is simply passed to filepath.Walk(). Judging from e.g. here, List() will be called with a prefix that does not end with a slash. Can I rely on it being intended to be a "directory"? Or can I rely on all potential prefixes never sharing a common prefix (i.e. the foo/foobar situation being impossible)?

I would much appreciate some input on this, and if you would like I would happily submit a PR with some doc updates once I understand the interface better ๐Ÿ™‚ Thanks a lot!

Oh, sorry, and a follow-up question: it seems that recursive = true is actually never used?

mholt commented

Good questions... it is a key-value store, but with directory semantics. Like a file system. I should probably clarify this, since in your question I see a gap in my documentation.

What are the expectations in case of a call to List(prefix: "foo", recursive: false)?

Perhaps what the docs should talk about is to list the next key segment (or path component) for keys that have a path through prefix, i.e., these two paths go through foo:

foo/cert/key
foo/cert/chain

So List(prefix: "foo", recursive: false) would return:

foo/cert

We use this functionality, for example, to discover ACME accounts that exist in storage.

For a generic key-value store implementation, all keys satisfy the prefix requirement, but returning any of the foobar keys would not seem to make sense?

Correct, foobar shouldn't be returned in this case. I'll clarify the docs.

Will a prefix contain a trailing slash?

Hmm, maybe, but I don't think that should matter. Let me know if you think of a reason it should, I'm happy to consider this.

Would it make sense to return the foobar keys if recursive were true?

No, since "foobar" isn't "foo" (prefix should look only at whole path components, and I need to clarify this).

Can I rely on it being intended to be a "directory"?

Since this is an exported interface, others can use it and call it, and I can't guarantee that every caller will know whether it's a directory. In CertMagic we only call List on what we think should be directories, but if for some reason the underlying storage structure changes (for example, a user modifies the files and folders on their disk, without our knowledge and in violation of the expected structure), it might be a file instead (which is broken, of course).

I'm not sure yet, if it's not a directory, whether an error should be returned, or just the file itself as the only entry.

Oh, sorry, and a follow-up question: it seems that recursive = true is actually never used?

We do use this as of Caddy 2.7 (currently in beta), as a way to export all items in storage.

mholt commented

@bitfehler I've pushed improved docs in 223063d if you want to take a look!

Let me know if anything else is unclear.

Thanks a lot for these clarifications!

Oh, sorry, and a follow-up question: it seems that recursive = true is actually never used?

We do use this as of Caddy 2.7 (currently in beta), as a way to export all items in storage.

Then it might also be worth specifying: Is it or is it not necessary to return non-terminal keys ("directories") in a recursive list? I am not a native English speaker, but the current phrasing of "non-terminal keys will be enumerated (i.e. "directories" should be walked)" does not make this clear to me. I get the walking part, but enumerated sort of sounds like directories should also be returned? I am hesitant, though, because in a key-value store you'd be enumerating keys that don't actually exist, which seems weird.

Again, thanks, and my offer still stands: happy to take a shot at some doc improvements, but if you already have something specific in mind, that's cool, too.

Again, thanks, and my offer still stands: happy to take a shot at some doc improvements, but if you already have something specific in mind, that's cool, too.

Doh, only saw your updates after posting that, nevermind. Your updates are great, but question from last comment still valid ๐Ÿ™‚

mholt commented

Then it might also be worth specifying: Is it or is it not necessary to return non-terminal keys ("directories") in a recursive list?

Good question!!

The only current use of recrusive = true that I know of is to export all the data. For this, only the terminal keys ("files") are needed.

However, I'm not sure what the correct or practical answer should be in general.

Directories are implicit, but we do need them listed when recursive = false.

For recursive = true, I'm leaning towards "necessary" (include directories) to keep the behavior consistent so the only difference is recursion, but I'm open to your thoughts on what would be more practical from your point of view.

The current FileStorage implementation does return directories, it looks like.

Well... ๐Ÿ™‚ I'd say returning directories also would be only a minor inconvenience, so I would also lean towards consistency. However, the gift keeps on giving: if directories are returned, can a caller expect them to be returned before any "file" in this directory? I swear, I am not trying to be a pain here. But this is specified e.g. for most archive formats, where you iterate over the list of entries and rest assured that each directory component is listed in order (e.g. /usr, /usr/bin, /usr/bin/ls). I suppose for a file-based implementation, this comes naturally. But being based on a plain key-value store, my current implementation of List() gets its result from iterating over what is initially a map[string], so if order matters I'd have to throw in a sort. That's not a huge issue, but I hope you see why I think it might be worth mentioning.

mholt commented

can a caller expect them to be returned before any "file" in this directory?

No, ordering is not guaranteed.

But this is specified e.g. for most archive formats, where you iterate over the list of entries and rest assured that each directory component is listed in order (e.g. /usr, /usr/bin, /usr/bin/ls).

This is absolutely not true with archive formats. The order can be arbitrary and traversing/listing them can unfortunately be very inefficient.

I suppose for a file-based implementation, this comes naturally.

Some file systems do order results by name, but not always.

But being based on a plain key-value store, my current implementation of List() gets its result from iterating over what is initially a map[string], so if order matters I'd have to throw in a sort.

I agree that plain key-value stores will not scale well. I recommend some sort of index. (Oh look, you've got yourself a database!) ๐Ÿ™‚

I do think there should be more implementations of Storage that use databases. That's what they're good at, after all.

No, ordering is not guaranteed.

Ack, thanks.

This is absolutely not true with archive formats. The order can be arbitrary and traversing/listing them can unfortunately be very inefficient.

Sorry, yes, you are of course right. The formats are rather generic, but the tools will usually try to keep the order (not alphabetical, but hierarchical)

Some file systems do order results by name, but not always.

But they all (some caveats apply) order by hierarchy.

I agree that plain key-value stores will not scale well. I recommend some sort of index. (Oh look, you've got yourself a database!) slightly_smiling_face

Yeah, but it'll be Good Enough For Now(TM). Looks like I can even skip the sorting, so... ๐Ÿ™‚

Anyway, I think with this everything is thoroughly specified, so thanks a lot for this!

mholt commented

You're welcome, thanks for the questions!

You may still want an index, so you can look up specific keys and folders faster than O(n).