i2mint/py2store

Stream Stores

Closed this issue · 1 comments

Streams is another common source and target of data, so we'd like to create tools to do the usual py2store stuff: rein different interfaces to a common one, and adapt to various data particularities.

The relevant builtin is https://docs.python.org/3/library/io.html.

What makes a data source or target a "stream" as opposed to other forms, such as key-value or sequence (list-like) store? We need to draw relationships between the different interfaces, and words used to describe the different functionalities within.

The stream interface seems to be built around the concept of a file's content, along with the possibility of an unbounded source of data.

  • Streams have concepts like read and write, which we also have in key-value or sequence constructs. For streams the read is more of an iterator of content, and write more of an append to content.
  • Streams have readline and writeline, pointing to the fact that content is assumed to be structured (above the atomic byte or bit, but below the source's location, metadata, etc.)
  • Streams have a concept of open, flush, and close, which is not needed in the key-value or sequence interfaces (why? because actions are assumed to be effective immediately?)

So we still get many key-value or sequence perspectives. A stream is a sequence of bytes, or lines. Seek navigates the sequence, tell gives us a key (a position) of the seek cursor, etc.

Hierarchical groupings

Since streams have a strong legacy to "file contents", it has this hierarchy of data.
bit --> byte --> [byte_word] --> [line] --> file
(the "[]" means optional here)

Byte words are fixed (usually) size byte sequences that should be taken as atomic when decoding. For example:

  • Bytes of audio (you've seen PCM16 and PCM32 for example.
  • Text encodings: Often 2 or 4 bytes as well.

Lines: Often of different sizes, but when there's lines, they're usually of a byte size that is a multiple of the word size.

Old issue that has now evolved to the creek project.

The issue is reproduced as a wiki here