cssinjs/istf-spec

Intermediate format

geelen opened this issue · 46 comments

With the work that's been progressing on styled-components static CSS extraction, we've been inching towards what I could see as a valid, cross-compatible intermediate representation of CSS-in-JS. This isn't what we're doing just yet, but it's what I had in mind to potentially solve all our edge cases:

.foo {
  prop: value;
  &:pseudo {
    foo: bar;
  }
}
[
  [ SELECTOR_OPEN ],
  [ SELECTOR, '.foo' ],
  [ PROPERTY, 'prop' ],
  [ VALUE, 'value' ],
  [ SELECTOR_OPEN ],
  [ SELECTOR, '&:psuedo' ],
  [ PROPERTY, 'foo' ],
  [ VALUE, 'bar' ],
  [ SELECTOR_CLOSE ],
  [ SELECTOR_CLOSE ],
]

This kind of intermediate representation lends itself to certain optimisations (autoprefixing could happen in-place, for example), as well as fairly compact storage (assuming the keys are some kind of compact Enum). Converting to a string or object representation would be extremely fast and unambiguous, and dynamic values are easily represented:

injectGlobal`
  .foo {
    color: ${ lighten('black', 0.2) };
    ${ manyRules };
    &:hover, ${ selectorFn } {
      foo: bar;
    }
  }
`

[
  [ SELECTOR_OPEN ],
  [ SELECTOR, '.foo' ],
  [ PROPERTY, 'color' ],
  [ VALUE, lighten('black', 0.2) ],
  [ DEFERRED, manyRules ],
  [ SELECTOR_OPEN ],
  [ SELECTOR, '&:hover' ],
  [ SELECTOR, selectorFn ],
  [ PROPERTY, 'foo' ],
  [ VALUE, 'bar' ],
  [ SELECTOR_CLOSE ],
  [ SELECTOR_CLOSE ],
]

Just a thought, but in my mind this makes static style extraction much more comprehensive, particularly if you could use Prepack or something to figure out that some function calls ( lighten('black', 0.2) for example) can be evaluated at build time.

The next step would be taking this format and understanding that, say, if a certain class x, y and z only appears in one place, don't define any nested selectors, and yet share certain properties and values, that their CSS could be "atomised" somehow to compact the resulting CSS.

It's been floating around in my head for a while now, thought it might be good to share it around.

kof commented

Reminds me a bit of this example though your format is better because it is streamable.

It could be even more optimized for memory (arrays take more) and storage (less brackets) by using just one array, because those keys already identify the data coming next.

Questions:

  1. Are selectors already scoped?
  2. DEFERRED manyRules, is manyRules a function that should be called later?
  3. Is selectorFn - a ref to a function which is imported in this file?
kof commented

Also at this point I would love to hear what @bmeurer thinks about this sort of format in terms of performance.

kof commented

Another question: what about splitting complex strings like '&:hover' which is

  • '&' which means a ref to a parent rule selector
  • ':' which means pseudo selector and
  • 'hover' - the actual pseudo selector name

even further in order to not have to run regexes over such things when parsing in order to identify tokens?

Sorry, I don't have any context on this. Can you outline the alternatives on a high-level and make the question more specific?

kof commented

@bmeurer I have updated the readme. Does it help?

kof commented

@bmeurer we don't have specific alternatives yet, but knowing your background I thought you could give us some tips about the data structure for this task with the goal to get max parse speed and min memory usage.

Sorry that readme isn't very useful either. But from what I understand you are proposing to serialize CSS as arrays of arrays?

kof commented

I think arrays of arrays is just an implementation detail, we would use the most effective format we can find.

So your question comes down to: Do you think it makes sense to encode CSS in whatever format of JavaScript?

kof commented

I think the question is more how we can do this efficiently. We need this as an intermediate format for all cssinjs libraries to allow user use any package from npm created with any library and exported in this format. CSS doesn't has all features we are going to support in that format.

So your question comes down to: Do you think it makes sense to encode CSS in whatever format of JavaScript?

That might or might not make sense 😉

The question is "Could there be a more efficient/fast/etc. format from a browser engine perspective?"

kof commented

I think by "browser engine" @mxstbr means js engine, v8 in particular.

The most compact representation is probably some dense string/binary representation, but that probably also takes more time to decode. A simple array (binary) representation is faster to decode but might take more space/time to transfer. It's really all about trade-offs (as you probably know).

At this general level, there's no good recommendation I can give. Once you have rough proposal, I can try to comment on those.

Maybe @littledan or @ajklein have some general thoughts on this.

I think it's more important to get the level of information right rather than worry about compactness or decoding speed at the moment. Things like handling &, how we handle scoping, function interpolations, etc. I think it would make the most sense to do this in the concrete rather than the abstract—version 3 of SC is going to include a focus on precompilation, so why don't we see what works for that then try to generalise it as a shared standard?

kof commented

@geelen what if we specify things in a decoupled fashion from SC, so that everyone understands what we do and why without fully understanding the entire SC scope. First version doesn't need to be perfect. Then we see as it goes at SC and adapt the spec here and evtl have more than 1 validation of concept running in parallel?

kof commented

I just want to have more people looking into it and trying it. This will allow us to move faster.

Suits me! We'll focus on SCv3 for the moment and compare notes in the future

Sorry, I don't have much context here. How does this proposal relate to the CSS Object Model? Is it that CSSOM is considered unergonomic, or lacks expressiveness, or performance, or something else? Is there any documentation that describes the relationship between cssinjs and CSSOM?

kof commented

@littledan This format should be essentially a DSL for sharing CSS on NPM. The difference to regular CSS is - it should be really easy to parse. Human readability doesn't matter. We need also to add features CSS does not have. Features like scoping, nesting, extends, function values. This format can be used then by CSSinJS libraries to produce CSS and allow the right DX to the user.

kof commented

Just read this article and though cc @sokra

@kof That article looks good--it's great to see tools helping developers, including on-demand chunking. But I don't see where that article gets at the data structures used to represent CSS, which could explain why we need this new format and CSSOM is insufficient. Could you pass CSSOM datastructures, and code generating them, between npm modules for this sort of thing? (Note: I'm not involved with WebPack, so based on the warning at the beginning of the article, it's likely that I'm missing something obvious here.)

cc @tabatkins @annevk @domenic @s3ththompson

Edit: Is the main additional thing here extending the datastructure to represent features of CSS that aren't supported by the browser yet? Maybe the missing feature is the right kind of extension points to CSSOM, and ensuring that polyfills meet your needs.

kof commented

Could you pass CSSOM datastructures, and code generating them, between npm modules for this sort of thing?

Given we would agree on a standard data structure, packages would only need to export those data stractures, same like es6 code is compiled to es3 + cjs before publishing on npm.

After that any tool can pick it up and either generate CSS at build time for e.g. using webpack or us more sophisticated CSSinJS libraries which would then decide how and when to generate CSS.

kof commented

Example:

Those constants will be some compact enums like @geelen said.

// styles.css.js
module.exports = [
  [RULE_START],
  [GLOBAL_SELECTOR, 'body'],
  [PROPERTY, 'color'],
  [VALUE, 'red']
  [RULE_END]
]

Would compile to by a loader or cssinjs lib.

body {
  color: red
}

@kof Do the CSSOM polyfills not work in ES3? Does CSSOM not compile well to or from CSS?

kof commented

@kof Do the CSSOM polyfills not work in ES3? Does CSSOM not compile well to or from CSS?

@littledan please help me to understand those questions. I don't see how CSS object model is related to an intermediate dsl.

kof commented

Created a pr with a first proposal #17

I think @littledan's question is whether or not the CSSOM functions sufficiently as the DSL for your purposes. (Or more generally, since you'll be doing manipulations that CSS doesn't natively support, the tree produced by the official parser, which doesn't do any grammar-checking.)

I can see the benefit of using a flat structure rather than a nested one; it's easier to write an iterative function than a recursive one. You can even do a "real" parse with the official parser, then easily flatten it into a single array with begin/end tokens like your examples. (That would, for example, give you raw token lists for the selectors, which you can then glom into better intermediate structures, rather than exposing raw strings.) On the other hand, it's harder to do larger manipulations, such as moving or wrapping a rule, since you have to track begin/ends to find the entire rule, so there's pros and cons.

I'm nearly done with the Typed OM spec for Houdini, which'll inform how the Generic CSS Parser API will work as well. It'll be nice to see how y'all do with these alternate structures, to let me know if it's useful to expose this sort of "parse event stream"-style thing.

kof commented

@tabatkins I think the structure we want to have is mainly designed for interoperability. We mostly care about performance of initial parsing to a JSON or CSS structure. We can't use CSSOM parser here (or I don't see how) or use regular CSS as an intermediate format for 2 reasons:

  1. performance
  2. the lack of variables, functions and modules

Houdini project looks promising, I think I am waiting for years for it to take off.

kof commented

Bth if houdini will allow to use this kind of structure directly to style the application and skip CSS notation completely - that would be a huge win.

kof commented

I have created a very minimal bench https://esbench.com/bench/592d599e99634800a03483d8 numbers are awesome, not sure though if I was successfull at avoiding v8's optimizations.

@kof What would be even more amazing would be a benchmark comparing CSSOM to your format.

kof commented

@littledan you mean comparing to CSS parser written in js?

@kof I mean comparing the manipulation of the rule in your format to the manipulation of the rules in CSSOM (since part of your claim was that CSSOM was too slow).

kof commented

Oh I see where misunderstanding is coming from. We can't use regular CSS as a format for interoperability because it simply lacks the features we need.

I think your idea was to do this:

  1. "whatever cssinjs" lib generates regular CSS as an interoperability format and publishes it to npm
  2. any other cssinjs lib parses that CSS using CSSOM and does manipulation, renders again to the CSSOM.

Its an interesting experiment though. Will add that in a bit to the bench.

kof commented

Btw. CSSOM still doesn't allow to set the selector in all browsers :( Thats why I only looked at browsers CSSOM implementation as a write-only target.

kof commented

Done, check it out https://esbench.com/bench/592d599e99634800a03483d8, performance difference is enormous.

@kof I was thinking more like, "whatever cssinjs lib" would contain functions that output CSSOM objects, rather than CSS in a string, to avoid the cost of re-parsing (for one). You use CSSOM as the format to do your manipulation, as well as the way you apply it to the DOM.

For old browsers which are missing CSSOM or don't have certain standard features implemented that you need, could you use a CSSOM polyfill?

I'm not sure what you mean by variables, functions and modules. Would these be in JavaScript or in CSS? If it's JavaScript functions, can you call these functions in JS and pass their return values to the CSSOM constructors?

Not entirely surprising that the initial difference is enormous, but maybe the existence of benchmarks like this might motivate browser optimizations, so I think it's valuable.

kof commented

By " CSSOM objects" you mean js objects with the same structure like the CSSOM spec?

@littledan One part of this is that they want to be able to extend and polyfill CSS, as many projects do; as such, using the native CSSOM doesn't work for this.

Additionally, one of the big motivations is to make it easy to mutate styles before they're finally rendered. Putting them in native CSSOM doesn't let you do that - they're automatically live. Also, native CSSOM is not optimized for mutations; it's actually quite slow when mutated, because browsers build out a lot of supporting data structures to make mutating the DOM fast to style.

All in all, the CSSOM is not a good data structure for what they're trying to do here, for multiple reasons.

kof commented

Also another reason is SSR, we simply have no CSSOM there as we are not in the browser and it doesn't make any sense to use a CSSOM implemented in JS if we can completely avoid the need to parse human readable css.

[
  [ SELECTOR_OPEN ],
  [ SELECTOR, '.foo' ],
  [ PROPERTY, 'color' ],
  [ VALUE, lighten('black', 0.2) ],
  [ DEFERRED, manyRules ],
  [ SELECTOR_OPEN ],
  [ SELECTOR, '&:hover' ],
  [ SELECTOR, selectorFn ],
  [ PROPERTY, 'foo' ],
  [ VALUE, 'bar' ],
  [ SELECTOR_CLOSE ],
  [ SELECTOR_CLOSE ],
]

this is quite bad, i'd say. can we use postcss-like AST?

kof commented

this is quite bad, i'd say.

What exactly and why?

CSS can be parsed in AST by postcss, JS can be parsed in AST by acorn/babylon. At the same time, this proposal is about yet another AST, if got it correctly
standards

kof commented

I think we didn't properly explain what we do here. The format we are discussing is super compact and super fast for parsing. You won't get there with a postcss ast, but I am happy if you can prove me wrong.

Basically this is a format which you can use to publish CSS and then during consumption transform it to any representation you want, depending on a runtime. You may want to convert it to AST, js objects, or even directly to a CSS string.

Yeah I've done a fair bit with the PostCSS AST and it's way heavier than we need. The purpose of this is an extremely lightweight interchange format that can generate real CSS, but have certain things (like Rules & dynamic values) updatable efficiently. I.e. when you publish a component on NPM, you publish all its CSS in this format, so people can use it without depending on the CSS-in-JS library you built it with.

kof commented

I guess this issue can be closed.