/semdoc

A document file format that allows writers to focus on the content solely and empowers readers to adapt the appearance to their devices and needs.

Primary LanguageRust

SemDoc – Semantic Document

SemDoc is a document file format that allows writers to focus on the content solely and empowers readers to adapt the appearance to their devices and needs.

Why yet another file format?

There are many great tools for writing documents: Google Docs, Word, Notion, Markdown, Latex, and many more. But for distributing documents, we're pretty much stuck with PDF. SemDoc aims to change that.

What's wrong with PDF? PDF is excellent for print-quality content: If you want to print something or publish a design brochure, it's a great fit. But I'm not too fond of PDF for general content consumption:

  • It organizes everything into pages. It's not like scrolling is a particularly new invention.
  • The layout is entirely static. I refuse to believe that putting my phone into landscape mode just to read a PDF without continually scrolling back and forth is the best user experience we can come up with in the 21st century.
  • Advanced features are hard to use correctly. Tabbing through input fields or copying text with accents rarely works. These symptoms indicate an issue with the format itself.
  • Dark mode? Anyone?

Fundamentally, PDF gives most of the control to document creators, leaving none to readers.

In contrast, SemDoc follows these principles:

  • Be a compile target. It doesn't aim to be readable to humans or be edited by hand; instead, it's an efficient binary format. Like PDFs, you compile other formats to it.
  • Be purely semantic. It contains no syntax information, only semantic information. Writers declare what to display, readers control how to show it.
  • Be extensible. Over time, the SemDoc format can be extended.

Relevance

Some might think, "documents are going to be cloud-first anyway. Consumers don't need file formats anymore; they'll collaborate online." That's probably true for most cases. But I'd argue there will always be a use-case for immutable atomic instances of documents to be sent. Immutability makes them legally binding. Latency might forbid you from collaborating with people living on Mars. The concept of transferring a file is straightforward to understand for us humans because files are like things.


Here's an explanation of the format.

Roadmap

  • Write vision
  • Implement base of the format
  • Make format debuggable
  • Document format
  • Implement block-level optimizations
  • Support blocks with more than 255 children
  • Add more blocks
  • Improve the quality of Markdown to SemDoc converter
  • Use Reference atom
  • Implement document reader in CLI
  • Implement document reader in Flutter
  • Optimize performance
  • Put process in place for proposing new blocks
  • Implement intermediary language for editing SemDocs

Block ideas

  • pixel images
  • vector images
  • redacted
  • comments
  • signed
  • secondary
  • more info
  • links
  • highlighting
  • quote