Stream Stores

Question

Stream Stores

Closed this issue 2 years ago · 1 comments

Streams is another common source and target of data, so we'd like to create tools to do the usual py2store stuff: rein different interfaces to a common one, and adapt to various data particularities.

The relevant builtin is https://docs.python.org/3/library/io.html.

What makes a data source or target a "stream" as opposed to other forms, such as key-value or sequence (list-like) store? We need to draw relationships between the different interfaces, and words used to describe the different functionalities within.

The stream interface seems to be built around the concept of a file's content, along with the possibility of an unbounded source of data.

Streams have concepts like read and write, which we also have in key-value or sequence constructs. For streams the read is more of an iterator of content, and write more of an append to content.
Streams have readline and writeline, pointing to the fact that content is assumed to be structured (above the atomic byte or bit, but below the source's location, metadata, etc.)
Streams have a concept of open, flush, and close, which is not needed in the key-value or sequence interfaces (why? because actions are assumed to be effective immediately?)

So we still get many key-value or sequence perspectives. A stream is a sequence of bytes, or lines. Seek navigates the sequence, tell gives us a key (a position) of the seek cursor, etc.

Hierarchical groupings

Since streams have a strong legacy to "file contents", it has this hierarchy of data.
bit --> byte --> [byte_word] --> [line] --> file
(the "[]" means optional here)

Byte words are fixed (usually) size byte sequences that should be taken as atomic when decoding. For example:

Bytes of audio (you've seen PCM16 and PCM32 for example.
Text encodings: Often 2 or 4 bytes as well.

Lines: Often of different sizes, but when there's lines, they're usually of a byte size that is a multiple of the word size.

Answer 1 · 2022-07-05T15:51:58.000Z

Old issue that has now evolved to the creek project.

The issue is reproduced as a wiki here