Stream Stores
Closed this issue · 1 comments
Streams is another common source and target of data, so we'd like to create tools to do the usual py2store stuff: rein different interfaces to a common one, and adapt to various data particularities.
The relevant builtin is https://docs.python.org/3/library/io.html.
What makes a data source or target a "stream" as opposed to other forms, such as key-value or sequence (list-like) store? We need to draw relationships between the different interfaces, and words used to describe the different functionalities within.
The stream interface seems to be built around the concept of a file's content, along with the possibility of an unbounded source of data.
- Streams have concepts like
read
andwrite
, which we also have in key-value or sequence constructs. For streams the read is more of an iterator of content, and write more of an append to content. - Streams have
readline
andwriteline
, pointing to the fact that content is assumed to be structured (above the atomic byte or bit, but below the source's location, metadata, etc.) - Streams have a concept of open, flush, and close, which is not needed in the key-value or sequence interfaces (why? because actions are assumed to be effective immediately?)
So we still get many key-value or sequence perspectives. A stream is a sequence of bytes, or lines. Seek navigates the sequence, tell gives us a key (a position) of the seek cursor, etc.
Hierarchical groupings
Since streams have a strong legacy to "file contents", it has this hierarchy of data.
bit --> byte --> [byte_word] --> [line] --> file
(the "[]" means optional here)
Byte words are fixed (usually) size byte sequences that should be taken as atomic when decoding. For example:
- Bytes of audio (you've seen PCM16 and PCM32 for example.
- Text encodings: Often 2 or 4 bytes as well.
Lines: Often of different sizes, but when there's lines, they're usually of a byte size that is a multiple of the word size.