sketch-hq/sketch-document

Standalone Sketch File Format v1.0.0 โ˜‚๏ธ

Closed this issue ยท 9 comments

Progress


This issues discusses how a v1.0.0 of a standalone Sketch File Format project might work and look like.

A standalone Sketch File Format would have the following important properties:

  • Serve as a single source of truth for the format
  • Allow disparate teams within Sketch B.V. to organise around a shared specification
  • Can be transformed into useful formats, including โ€”
    • Documentation
    • TypeScript types (typesafe documents or document fragments in JS/TS projects)
    • JSON Schema (validate documents)
    • Swift models (for Sketch.app)
    • GraphQL schemas (for backend)
    • Anything else
  • Defines the interchange format for an ecosystem of internal products and 3rd party integrations
  • Treated as a product in its own right
  • Public, published and working in the open
  • Independently versioned (i.e. deliberately ignorant of any Sketch.app version), with a predictable, measured release cadence
  • No other Sketch B.V. product dependencies (root of the tree)
  • Easily maintainable with modern tooling and CI to ensure internal correctness

โœ‚๏ธ Hand-crafted vs auto-generation

We've already tried auto-generating a file format specification from the Sketch code base. It was an incredibly useful learning exercise, but has critical drawbacks:

  • It doesn't solve the core organisational issue, which is that the relationship between Sketch and the file format is inverted (Sketch should consume and conform to a public file format, rather than potentially mutating it without warning every release)
  • It proved difficult to generate a 100% correct and sufficiently rich format automatically from the Sketch code base

Therefore, maintaining a hand-crafted file format as a product seems like the best next step to explore.

๐Ÿ‘‘ Source format

The source format defines the technology we use to maintain the file format source, but isn't prescriptive about the formats that can be derived from it. Indeed, the objective here is to pick a source format that enables fluent transformation to as many other output formats as possible.

There are two main candidates:

JSON Schema

JSON Schema itself is an obvious choice. It has rich semantics which allows it to fully describe complex models, and has been widely adopted with tooling support in many languages and platforms. The JSON Schema specification process has been adopted by the IETF and sees active development.

A potential downside is that JSON Schema is just raw JSON, and as such the experience of maintaining a complex/large JSON Schema document by hand could have some challenges. We could look into JSON Schema $refs to split the schema into manageable chunks.

TypeScript

Defining the file format in code using TypeScript would bring a few benefits:

  • TypeScript compiler could check correctness during compilation
  • Use tooling and IDE integrations to enhance the maintenance experience (formatting, linting)
  • Code separation into modules would enable the schema to be broken into smaller logical chunks more easily than manually managing JSON files
  • TypeScript is similar to Swift, so editing a schema in TypeScript could potentially be easier for native developers, rather than having to learn the JSON Schema spec (we'd have to gather opinions here)
  • No need to implement a TypeScript output format separately (which we're sure to need immediately)

The TypeScript AST is well documented, and would serve as a robust data format to enable transformation to other formats. We could write our own TypeScript to JSON Schema transformer for example, or use existing tools like typescript-json-schema. Tooling for TypeScript to Swift transformation exists in the wild too, e.g. quicktype.

We'd have to ensure that we can sufficiently describe the existing file format model in TypeScript before choosing this route, since the range of type information and their interrelationships it can describe is more limited than JSON Schema (this could turn out to be a good thing - since its also simpler).

Swift

There could be a compelling case made for using Swift for the source format, since the most important consumer of the file format will be Sketch.app. The feasibility of this would hinge around the suitability of using the Swift AST as a source data format for onward transformation into JSON Schema, TypeScript and others.

Is the Swift AST sufficiently mature? What does the existing tooling and documentation around it look like? Does it limit the potential for 3rd parties to easily write their own transformation scripts? This would need investigating by a native developer.

๐Ÿ’โ€ Output formats

We would expect the community to use the file format to generate diverse formats that work with their specific workflows and toolchains, therefore we don't need to be exhaustive in the output formats we initially implement.

A v1.0.0 of the file format could simply include the source format itself, whatever it is. Additional output formats that we're certain to need internally (e.g. TypeScript, or Swift) could be looked at as a follow on task.

Rather than pollute the core file format repo with generator code for other formats, these could be housed in separate repos (GitHub Actions could eventually be used to trigger builds when a new file format is released).

๐ŸŒณ Structure

The following structural elements and types should be describable by the file format,

  • A definition for the root document type and every class/object type that can appear as a child in the document
  • Limit object properties to allowable types (e.g. layers array can only contain elements of union type Layer, for example)
  • Encode whether properties are optional
  • Descriptive enums (e.g. all possible values of _class, or colorProfile properties etc)
  • Handy union types not explicitly appearing in the Sketch.app model, but useful for developers parsing/generating documents in the wild (e.g. a type representing a union of all layer types)
  • If practical reproduce some aspects of the Sketch.app model class hierarchy (main benefit here would be to reduce duplication in the file format source, rather than attempting to mirror the internals of the native app, e.g. define an abstract type for layers, which could share common properties amongst all concrete types that behave as a layer)

โš ๏ธ Change management

There's no beating around the bush - the main driver of change in the file format would be Sketch.app.

Rather than deny that reality the objective is to formalise the relationship, ensure changes to the file format are explicit and deliberate and to implement a process where new versions of the file format are released ahead of time, or concurrently with the version of Sketch.app that supports it.

The change management process could look something like,

  • File format versioned independantly of Sketch.app, following semver
  • Sketch documents written in the file format no longer declare their compatibility with Sketch.app, rather they declare the version of the file format they conform to
  • Limit the pace at which new features are added to the file format - since this churns the format and burdens the entire ecosystem. Suggest annually, or less
  • This would mean stacking up Sketch features that require new features or changes to the file format and releasing them together in a dedicated Sketch release that advanced the file format by major version. Example scenario,
    1. Discuss and define the set of changes needed to support new features for a special release of Sketch that introduces new features to the file format
    2. Pre-release v2.0.0-rc1 of the file format supporting said features early on
    3. Iterate through release candidates while preparing for main Sketch release, e.g. v2.0.0-rc2, v2.0.0-rc3, etc.
    4. Then headline the relevant Sketch release "Now outputs in v2 of the file format" and simultaneously release the final v2.0.0 version of the file format
    5. Interested 3rd parties could have followed along with the release candidates too, so upgrading to the final release would hopefully be a painless version bump. For those who didn't have their eye on the ball the release would be documented with a changelog and ready to go with freshly published schemas and types, so upgrading to a new version of the file format and fixing any compile errors hopefully also a much improved experience than the current arrangement allows
  • Bug fixes could be released as patch versions as often as needed to fix inconsistencies between the file format and versions of Sketch released in the wild

๐Ÿค– Hosting

The primary hosting mechanism could be tagged GitHub releases on the file format repo.

Thereafter, derived output formats could be hosted in whatever fashion makes most sense for their community/ecosystem. E.g. npm for TypeScript types etc.

๐Ÿค” For further discussion

How will the file format fit into the native Sketch.app workflow?

The responsibility here will lie with the Sketch dev team to figure out an approach for ensuring the versions of Sketch they release output documents that validate against the file format.

It's important to note that, technically speaking, a v1.0.0 of a standalone file format should precisely match the schema of files outputted by the current version of Sketch. So initially the changes to the Sketch development process might centre more around testing, than large internal changes.

We need input from native developers but a few strategies could include,

  • Generate Swift code from the file format
  • Unit tests
  • Integration tests validating documents against a JSON Schema

Impact on Sketch

The most important aspect to point out is that using a file spect does not change how Sketch or SketchModel is expected to work. It is more about entering a contract, allowing other products and implementations to be developed independently from Sketch.

Scenarios

Opening and saving Sketch files typically falls into one of the following scenarios:

  1. Sketch creates a new file that using the latest file format version it supports
  2. Sketch opens a file using an older file format version than the latest version it supports
    • SketchModel reads file formats it supports
    • Save produces a file using the latest file format version it supports
  3. Sketch opens a file using a newer file format version than the latest one it supports
    • SketchModel reads the file if it is still compatible with it

Breaking changes, e.g. something like the transition from colors to colorAssets, would be given a grace period, making colors as deprecated, see json-schema-org/json-schema-spec#737, before removing it in a future file version.

We'd maintain the version and compatibilityVersion attributes from the current file format structure.

It is up to SketchModel and other implementations to chose a strategy for handling different file versions:

  • Implement version specific runtime behaviour, similar if #available(10.14, *) using document versions, similar but more deterministic than the existing migrations
  • Run migration scripts, similar to database migrations

Conforming to the file format spec

1. Step

Adding tests to SketchModel which produce JSON data that get validated against the JSON Schema. It's possible to validate a whole document or document fragments such as an individual layer.

2. Step

This could potentially (partially) replace the .xcdatamodel but this is up to the Mac development team. An option would be to refactor the serialisation/deserialisation so that base classes can be auto-generated from the JSON Schema since it allows to create an AST representation. There is an more mature and established toolchain available than our own code generator.

Source format

We got JSON Schema, Typescript and potentially Swift to choose from.

  • JSON Schema is "closer to the metal" whereas TypeScript is an abstraction
  • JSON Schema is richer and can contain details that qualify as global business logic, e.g. types based on values
  • TypeScript experience of making changes is more robust and comfortable
  • If done in Swift, we'd need a TypeScript version immediately for use, raising the question of the required toolchain, e.g. quicktype or alternatives
  • If it's TypeScript, we'd need JSON Schema right off the bat, either by using existing tools or handling it ourselves.

The decision on using JSON Schema or TypeScript (or Swift) as the source of truth feels like it depends mostly on who's making the changes and what they feel comfortable with. It would be great to get input from the Mac team on this.

Sketch fileformat itself had already multiple transformations.

We had long time ago JSON (one big json file), but was not optimal as it's not very efficient. Harder to read over a stream, partly eg.

We had a macOS fileformat which was secretly a package/folder. However, lot of sync issues over Dropbox eg. Data/artboards missing.

We had a SQLite database, where we even stored blob's. We had quite some corrupted files due parallel operations, saving/syncing the DB eg (more about that in the backlog)

The current file format is a ZIP, with multiple JSONs and images saved separately.
Due the zip, it's one single binary that can easily be shared and uploaded. The images allow quick preview. The distribution of the data in multiple JSONs allow lazy loading, and faster saves.

image

I'm all in for a new format, especially to make it more readable and understandable by people, other apps, plugins, tools, but this is quite a big call.

What are the exact issues we have currently? The fact that there is no definition of our file format? That it's not generic enough and not using standard/atomic variables but custom classes?

I'm also thinking about the collaboration feature. Could that be something that can be done at the format level, versus the tree? ๐Ÿค”

@jelledelaender I think there's a misunderstanding here; we're not proposing a new format, instead we're suggesting to introduce a spec for the current format so that different parts, developed by different teams using potentially different languages (currently Obj-C, Swift, JavaScript, TypeScript, Elixir, Go) can be safely assumed to be compatible.

For the collaboration, that's a good question and possibly goes beyond the scope of this issue. However, technically speaking a change ripples down to the persistence layer and the way we've been using the spec/TypeScript types allow to work with any sub-tree or specific nodes.

What are the exact issues we have currently? The fact that there is no definition of our file format?

@jelledelaender โ˜๏ธYep, this. We're not proposing any changes to the existing file format, or how its persisted to disk, rather proposing to formalise the definition somewhere in code. If that definition is in code, it means it can be transformed to be maximally useful in various contexts (TypeScript for frontends, GraphQL schema for backends, maybe in time auto-generate Swift serialisers, and more). And ideally we'd be able to commit to some level of stability in that definition, in such a way that it at least can't be unexpectedly changed with every Sketch release.

It might be worth mentioning explicitly that this isn't about recreating the native Sketch model elsewhere, that can and should stay encapsulated within Sketch - this is about defining in code, with a high degree of strictness, what valid Sketch document JSON looks like on disk, or over the wire. This enables type safety in internal and 3rd party app code that might work with Sketch documents, or fragments of Sketch documents, and I think delivers on the vision of an open file format that can act as an interchange format between products.

It was actually your comments on #proj-linting channel that really got me thinking about this - when you said we'd have to slave releases of a linting tool for every variant of Sketch releases https://bohemiancoding.slack.com/archives/CK9MW0PL2/p1563203566003800

That imposes a very large burden on an internal linting tool, but one that could just about bearable. But in order to deliver the sort of linting experience we want to, where 3rd parties are able to craft custom rules for their diverse internal workflows and standards, to have the caveat that they have to build rules against an undocumented and unspecified file format that could change under their feet with every new Sketch release ... well, it's basically a non-starter IMO.

And it's not as if linting is unique in this regard - any product that wants to work with Sketch documents is currently built on shakey foundations without a defined file format. And when additional stability (or at least the illusion thereof) is required by 3rd parties they have to recreate the format themselves on a case by case basis, and attempt to manually keep in sync with the sorts of files Sketch outputs. E.g. here's an example in Amazon's Sketch Constructor tool https://github.com/amzn/sketch-constructor/tree/master/models

So instead of everyone having to create their own file format spec, this proposal is also about us doing that ourselves properly, and also own the versioning and change management in a way only we're able to.

I'm all in for a new format, especially to make it more readable and understandable by people, other apps, plugins, tools, but this is quite a big call.

I don't have many opinions about what can and should be improved within the file format. But I do think that properly defining it could make future changes less painful.

Some ramblings from the Mac side...

I have no experience of TypeScript and don't really understand how it would be used to define the format, but from what I'm reading here it sounds like a sensible choice, especially given that we can generate a JSON schema directly from it.

Regarding code generation:

We currently use the Sketch.xcdatamodel alongside a custom tool (Coma) to generate boilerplate source code. I don't think that the technology we use to define the format's source of truth necessary has to be the same as the source we generate this code from, but the .xcdatamodel is creeking at the seams a little and if we can kill two birds with one stone that would be useful. With that in mind I have some questions:

Firstly - while Swift is the new shiney, I'm not convinced we want to change the model to be Swift any time soon. For one thing there's a posibility that we'll develop a new rendering engine in C++ and I think the interopability between C++ and Obj-C would better than between C++ and Swift. So is there a mechanism to generate Obj-C code from TypeScript? I suspect the answer is yes in that we could generate JSON and use Mustache to generate the Obj-C code as we do at the moment.

Secondly - does TypeScript offer a way to extend the model without editing. This is a limitation with the Sketch.xcdatamodel. A recent example of this, it would have been nice to supply a separate config file to define attributes required for plugin diffing rather than adding them to the .xcdatamodel.

Also, what additional tools are needed to compile/run TypeScript. Currently Coma is included in the Sketch repo meaning that a new developer can more or less just clone the repo and compile in Xcode to be up and running. A concern with having to install third party tools for this is that they could break going forward - e.g. they don't run on 10.16 when it's released.

Change management:

Limit the pace at which new features are added to the file format - since this churns the format and burdens the entire ecosystem. Suggest annually, or less

That scares me. While I can completely see the logic I worry that we'll lose a lot of the agility that gives us an edge over our competitors. I would hate to see Figma releasing features before us just because we were waiting for a point where we're allowed to change the file format.

Just to follow on from what @christianklotz said above, it might be useful to describe how we handle rolling changes on the Mac side:

Forward migrations (opening older documents) are the simplest to handle; when we change the version number we can add additional code telling Sketch how to migrate old objects/properties to the new format. When Sketch opens a file with an older version number it automatically runs this migration code to bring the document up-to-date.

Opening newer documents in older version of Sketch is trickier, but with care we can manage this too. A good example of this is the recent #25151 where we only write values if they don't match the default. Obviously, if we just stop writing these values in v58, v57 wouldn't be able to read v58 files so the way we deal with this is as follows:

There are 2 version numbers stored in a Sketch document; a current version and a compatibility version (these are defined in MSDocumentVersion.h - it seems likely that this file is something we should consider integrating into all of this work). As of v57 the version is 119 and the compatibility version is 99, which means v57 will continue opening documents as long as the compatibily version is <= 119.

At v58 we will add the ability to read files where the defaults are missing, but we won't yet omit writing them. We will also increase the current version number to 120. The compatibility version will stay at 99 meaning that v57 can still open v58 files.

At some later point we'll stop writing the default values. At this point we'll change the compatibility version to 120. Now, when v57 tries to open the document it'll read the compatibility version and as that (120) is greater than the current version (119) it will fail, presenting the user with an error. However, it the user tries to open the document in v58, the compatibility version will match the current version and we can open the file. Further, the user could then save the file from v58 and this would be openable by v57.

A different example would be symbol overrides. Originally these were stored as a dictionary before later being moved to objects in their own right. To allow old versions of Sketch to understand the newer files we had a period of overlap where both the old format dictionary and the new override objects were writen. I think it would be good in situations like this to be able to publish a format containing both the dictionary and the objects, but to be able to mark the dictionary as deprecated.

@opsGavin

So is there a mechanism to generate Obj-C code from TypeScript?

The useful property of TypeScript is that it has a well defined and documented AST, so a TypeScript file is easily transformed into a pure data representation. So then it's a case of writing shell scripts (in Node) that ingest that data, and output it in a new format, for example JSON Schema.

Here's an example from an existing tool - TypeScript on the left, and the generated Obj-C on the right.

I suspect the answer is yes in that we could generate JSON and use Mustache to generate the Obj-C code as we do at the moment.

Yep, you're correct - ultimately the file format would end up as plain JSON, and the tooling you use to generate code is up for grabs.

Also, what additional tools are needed to compile/run TypeScript. A concern with having to install third party tools for this is that they could break going forward - e.g. they don't run on 10.16 when it's released.

TypeScript is a Node tool, so Node and npm are required. I don't think breaking is too much of a concern, it's built by Microsoft and currently sees about 6.5M weekly downloads, mostly by web developers who all use Macs. That said, I'm not sure where we'd want to store the file format source of truth itself ... on its own somewhere, or bundled into the Sketch repo? If it was on its own then the toolchain used to build it wouldn't be a direct dependency of Sketch, since I guess Sketch would consume a plain JSON representation.

Do you have any opinions about whether the source of truth is in Sketch, or not?

All that TypeScript chat aside though, following a sprint planning discussion with @christianklotz this morning, we've more or less decided to have a stab at writing a basic/MVPish version as plain JSON Schema. Got to start somewhere, and writing the format in JSON seemed like a simple place to begin. Should be useful to get the ball rolling at the very least.

Secondly - does TypeScript offer a way to extend the model without editing

This an interesting one, because I think both TypeScript (and JSON Schema) are more concerned about describing the shapes of objects, rather than recreating classically OO inheritance patterns. So I think with both approaches extending is OK, but you run into problems with attempting to override existing properties with new types using their own semantics.

That said, JSON Schema is just JSON, so it's easy enough to write scripts that manipulate it as pure data. I'm sure we could come up with a nice way to "patch" a file format, temporarily or otherwise, with extra props. The thing with JSON is that there's really quite a mature ecosystem of specs around now, case in point - http://jsonpatch.com/

Limit the pace at which new features are added to the file format - since this churns the format and burdens the entire ecosystem. Suggest annually, or less

That scares me. While I can completely see the logic I worry that we'll lose a lot of the agility that gives us an edge over our competitors

Yep, a few others raised concerns about this. This is about the tension between Sketch as a consumer product that requires frequent improvements, and Sketch as a platform/format that needs to co-exist within toolchains. I suppose the best thing that can be said is that by formalising and publishing a file format spec, we don't overtly harm the former interest, and actively improve the latter.

it might be useful to describe how we handle rolling changes on the Mac side

Since the current objective of the file format schema is just to describe how a Sketch file should look on disk, it doesn't seem like it would initially get in the way of how you currently handle rolling changes. The schema would just define the current and compat version properties as numbers, say where they appear in the JSON, maybe add some descriptions etc.

An interesting question to me is that, hypothetically speaking, if a Sketch file "just" declared what version of the file format it was saved as (e.g. file format v2 or whatever) could the logic that determined compatibility be self-contained in Sketch, i.e. not leaking into what's persisted on disk?

I think it would be good in situations like this to be able to publish a format containing both the dictionary and the objects, but to be able to mark the dictionary as deprecated.

For sure, we could mark properties as deprecated in JSON Schema. At that point it's just a boolean flag, and we use it to adjust what may get generated, whether a deprecated label on some website documentation, or affecting code generation somehow.

Once we got something shareable, we already got some people who've expressed interest in giving it a spin.

We've now release v1 of the file format https://www.npmjs.com/package/@sketch-hq/sketch-file-format, so closing this issue since it feels like we've achieved that initial milestone.

The TypeScript AST is well documented, and would serve as a robust data format to enable transformation to other formats. We could write our own TypeScript to JSON Schema transformer for example, or use existing tools like typescript-json-schema. Tooling for TypeScript to Swift transformation exists in the wild too, e.g. quicktype.

Oh boy this would have been great to know! ๐Ÿ˜„