zarr-developers/zarr-specs

Docs file structure

Closed this issue · 3 comments

Here's a proposal for how to organise the documentation files within this repository. Comments very welcome.

  • docs - Top level folder for all documentation. Docs will be in RST format and built via sphinx.
    • conf.py - Sphinx configuration file.
    • index.rst - Main documentation page. Provides a brief introduction to Zarr and the organisation of the documentation. Includes a complete table of contents.
    • process.rst - Describes the processes for proposing new specs or changes to existing specs.
    • protocol - Folder containing specifications of the core protocol.
      • v1.rst - Version 1 of the core protocol. Migrated without change from here.
      • v2.rst - Version 2 of the core protocol. Migrated without change from here.
      • v3.0.rst - Version 3.0 of the core protocol, to be written.
    • transformations - Folder containing specifications of transformations of the core protocol.
      • consolidated-metadata - Folder containing specifications of the consolidated metadata transformation.
        • v1.rst - Version 1 of the consolidated metadata format, to be written.
      • chunk-key-separator.rst - Documentation of the chunk key protocol transformation, which involves rewriting chunk keys to use a different character as the separator character between chunk grid indices (e.g., '/' instead of '.').
    • extensions - Folder containing specifications of extensions to the core protocol.
      • zcdf - Folder containing specifications of the NetCDF-style extensions to the core protocol.
        • v1.rst - Version 1 of the ZCDF extension spec.
    • storage - Folder containing specifications of storage layers. Each storage layer spec describes how operations in the abstract storage interface (get, set, delete key/value pairs) are translated into concrete operations in a storage system such as a file system or cloud object store.
      • file-system.rst - Spec that maps the abstract storage interface onto file system operations.
      • zip-file.rst - Spec that maps the abstract storage interface onto operations on a zip file.
      • dbm.rst - Spec that maps the abstract storage interface onto operations on a dbm-style database (including gdbm, ndbm and Berkeley DB).
      • lmdb.rst - Spec that maps the abstract storage interface onto operations on an LMDB databases.
      • sqlite.rst
      • mongodb.rst
      • redis.rst
      • abs.rst
      • gcs.rst
      • s3.rst
      • ...
    • codecs - Folder containing codec specifications. Codecs include filters and compressors. A codec specification describes the chunk encoding/decoding process and the encoded format. These may just be references to documentation published elsewhere, and/or a reference implementation.
      • adler32.rst
      • astype.rst
      • blosc.rst
      • bz2.rst
      • categorize.rst
      • crc32.rst
      • delta.rst
      • fixedscaleoffset.rst
      • gzip.rst
      • json.rst
      • json2.rst
      • lz4.rst
      • lzma.rst
      • msgpack.rst
      • msgpack2.rst
      • packbits.rst
      • pickle.rst
      • quantize.rst
      • vlen-array.rst
      • vlen-bytes.rst
      • vlen-utf8.rst
      • zlib.rst
      • zstd.rst

To elaborate on a couple of things...

Here I'm using "Zarr core protocol" to mean the core spec that defines the array and group metadata formats, the abstract interfaces for storage layers and codecs, and the logical model for how arrays are divided into chunks, and how storage keys are constructed for storing metadata and chunk data. Previously this has been called the "Zarr storage specification" but I think that "protocol" is a better word as it's closer to what this spec is actually defining.

In this structure I'm envisaging that specs may be decoupled and versioned separately. I.e., the core protocol is decoupled from codec and storage layer specs. This is intended to allow for new storage layers or codecs to be defined without requiring any changes or versioning to the core protocol.

I've also tentatively included extensions here as a place that might hold extensions like ZCDF. Previously there has been discussion of having a separate repo for extensions/conventions, however I'm wondering if we should try and keep everything together.

xref #8 which identifies some components of the overall system architecture, and thus has some bearing on how specs are organised.

Thanks for taking the time to put this together. I think it is a very sensible way to begin and will allow us to move forward with clarity.

After some thought, here is a slightly revised proposal for the spec docs structure:

  • docs - Top level folder for all documentation. Docs will be in RST format and built via sphinx.
    • conf.py - Sphinx configuration file.
    • index.rst - Main documentation page. Provides a brief introduction to Zarr and the organisation of the documentation. Includes a complete table of contents.
    • process.rst - Describes the processes for proposing new specs or changes to existing specs.
    • protocol - Folder containing protocol specifications.
      • core - Folder containing versions of the core protocol.
        • v1.rst - Version 1 of the core protocol. Migrated without change from here.
        • v2.rst - Version 2 of the core protocol. Migrated without change from here.
        • v3.0.rst - Version 3.0 of the core protocol, to be written.
      • extensions - Folder containing specifications of protocol extensions.
        • consolidated-metadata - Folder containing specifications of the consolidated metadata protocol extension.
          • v1.rst - Version 1 of the consolidated metadata format, to be written.
        • zcdf - Folder containing specifications of the NetCDF-style extensions to the core protocol.
          • v1.rst - Version 1 of the ZCDF extension spec.
        • [other protocol extensions]
    • storage - Folder containing specifications of storage layers. Each storage layer spec describes how operations in the abstract storage interface (get, set, delete key/value pairs) are translated into concrete operations in a storage system such as a file system or cloud object store.
      • file-system.rst - Spec that maps the abstract storage interface onto file system operations.
      • zip-file.rst - Spec that maps the abstract storage interface onto operations on a zip file.
      • [other storage layers]
    • codecs - Folder containing codec specifications. Codecs include filters and compressors. A codec specification describes the chunk encoding/decoding process and the encoded format. These may just be references to documentation published elsewhere, and/or a reference implementation.
      • blosc.rst
      • bz2.rst
      • [other codecs]

The main change here is that I have dropped the idea of "protocol transformations" because it was too complicated. Now there is only the core protocol and various protocol extensions. The idea here is then that the core protocol should be the set of features which all language implementation can agree to implement, and anything else can be a protocol extension.

Closing this for now, since the file structure is in the repo now. For further discussions about the spec format please see #179